02 Introduction

Overview of Application of Python and
Data Analysis in Petroleum Engineering
Prepared By
Archana
• “Data is the new oil” is a common saying nowadays in the areas of
marketing, medical science, economics, finance, any research field,
and the IT industry.
• The significance of oil is derived from the fact that oil companies have
been ruling the globe for decades. Major global tension has its base in
the form of oil.
• Oil has limited availability, data is available in abundance, and oil
being a tangible product has an associated cost, data has no such
associated cost as it is a non-tangible product (Adesina, 2018).
• New Oil, i.e., Data is a collection of numbers, words, events, facts,
measurements, and observations. The data after processing gives us
information. The information leads to useful knowledge.
• A challenging task is to process these data into information and EDA
(Exploratory data analysis) is the solution to that challenge (Mukhiya
& Ahmed, 2020).
What is Exploratory Data Analysis (EDA) ?
• EDA is a phenomenon under data analysis used for gaining a better
understanding of data aspects like:
Main features of data variables and relationships that hold between them.
Identifying which variables are important for our problem
Various exploratory data analysis methods like:
• Descriptive Statistics, which is a way of giving a brief overview of the

dataset we are dealing with, including some measures and features of the
sample
• Grouping data
• ANOVA, Analysis Of Variance, which is a computational method to divide
variations in observations set into different components.
• Correlation and correlation methods
Exploratory Data Analysis
• EDA is a process through which an available dataset is examined to
discover patterns, detect any irregularities, test hypotheses, and
statistically analyze assumptions.
• The main purpose of EDA is to understand what the given data tells
before modeling or formulating hypotheses.
• EDA was promoted by John Tuckey to statisticians (Mukhiya &
Ahmed, 2020).
• Contemplating data requirements, data collection, data processing, and
data cleaning are the stages that precede EDA.
• An appropriate decision needs to be made from the data collected
about different fields which are primarily stored in electronic
databases. Data mining is the process that gives an insight into the raw
data and EDA forms the first stage of Data mining.
• Different approaches towards data analysis
• There are several approaches for data analysis and a glimpse of three
important approaches viz. classical data analysis, Exploratory data
analysis, and Bayesian data analysis approach are shown in the
following figure
Stages of EDA
Mukhiya & Ahmed, 2020 put forth the four different stages of EDA
which are-
1. Definition of the problem – To define a problem, it is important to
define the primary objective of the analysis alongside defining main
deliverables, roles, and responsibilities, the present state of the data,
setting a timeline, and analyzing the cost-to-benefit ratio.
2. Preparation of data – In this stage, the characteristics of data are being
comprehended, the dataset is cleaned, and irrelevant data are deleted.
3. Analyzing the data – In this stage, the data are summarized, hidden
correlations are derived, predictive models are developed and evaluated,
and summary tables are generated.
4. Results representation – Finally, the dataset is presented to the target
audience in the form of graphs, and summary tables.
References
• 1. Adesina, A. (2018). Data is the new oil. Retrieved
from https://medium.com/@adeolaadesina/data-is-the-new-oil-2947ed8804f6
• 2. Mukhiya, S, K., & Ahmed, U. (2020). Hands-on Exploratory Data Analysis
with Python. Mumbai: Packt.
Crude Oil Consumption Forecasting Using Classical and Machine Learning
Methods (Ref-1)
• The global oil market is the most important of all the world energy markets.
Since crude oil is a non-renewable source, its quantity is fixed and limited.
• To manage the available oil reserves, it will be helpful if we have an estimation
of the future consumption requirements of this resource beforehand.
• This paper describes methods to forecast crude oil consumption for the next 5
years using the past 17 years’ data (2000-2017).
• The decision-making process comprised of: (1) Preprocessing of the dataset,
(2) Designing the forecasting model, (3) Training model, (4) Testing model on
the test set, and (5) Forecasting results for the next 5 years.
• The proposed methods are divided into two categories: (a) Classical methods,
and (b) Machine Learning methods. These were applied to global data as well
as to three major countries: (a) the USA, (b) China, and (c) India.
• The results showed that the best accuracy was obtained for polynomial
regression. An accuracy of 97.8% was obtained.
Artificial Intelligence-Based Prediction of Crude Oil Prices Using Multiple
Features under the Effect of Russia–Ukraine War and COVID-19 Pandemic
(Ref-2)
• The effect of the COVID-19 pandemic on crude oil prices just faded; at this
moment, the Russia–Ukraine war brought a new crisis.
• In this paper, a new application is developed that predicts the change in
crude oil prices by incorporating these two global effects.
• Unlike most existing studies, this work uses a dataset that involves data
collected over twenty-two years and contains seven different features, such
as crude oil opening, closing, intraday highest value, and intraday lowest
value.
• This work applies cross-validation to predict the crude oil prices by using
machine learning algorithms (support vector machine, linear regression, and
rain forest) and deep learning algorithms (long short-term memory and
bidirectional long short-term memory).
• The results obtained by machine learning and deep learning algorithms are
compared. Lastly, the high-performance estimation can be achieved in this
work with the average mean absolute error value over 0.3786.
A subsurface machine learning approach at hydrocarbon production recovery & resource
estimates for unconventional reservoir systems: Making subsurface predictions from
multidimensional data analysis (Ref-3)
• An innovative, practical, and successful subsurface machine learning workflow was
introduced that utilizes any structured reservoir, geologic, engineering and production
data.
• This workflow is colloquially called the Artificial Learning Integrated Characterization
Environment (ALICE), and it has changed the way Chevron manages its tight rock and
unconventional assets.
• The workflow guides users from framing and data gathering to geospatial assembly,
quality control and ingestion, then on through machine-learning feature selection,
modeling, validation, and acceptance for results reporting.
• The ultimate products of the workflow can be visualized in both map or log (depth) space
to help identify key areas for good optimization or landing zones, respectfully.
• The results from ALICE have been used within Chevron to aid in exploration review
assessments, type curve adjustments, landing strategies, well performance lookbacks and
more.
• A real-data example of the workflow is presented from start to finished product for the
Midland Basin Wolfcamp A, a maturely developed unconventional reservoir. The ALICE
workflow and products were developed through close cross-functional collaboration
between business units, data science, and research components of the corporation.
Prediction method and application of shale reservoirs core gas content based on machine learning
(Ref-4)
• To improve the recovery of shale gas, it is very important to accurately grasp the gas content of
shale reservoirs.
• Considering the problems of low accuracy, strong local limitation and poor adaptability of seismic
data of traditional methods such as empirical formulas and regression fitting.
• Based on machine learning (ML) algorithms such as support vector regression, decision trees,
random forests, BP neural networks and convolutional neural networks, an intelligent prediction
method of shale reservoir core gas content based on machine learning was established by using
three parameters: P-wave velocity (Vp), S-wave velocity (Vs) and density (RHOB).
• It was compared with traditional method prediction, core tests and other gas content data, which
verified the effectiveness and high-precision characteristics of the method. Additionally, in support
vector regression (SVR), decision tree (DT), random forest (RF), BP neural network (BP),
convolutional neural network (CNN), and other machine learning algorithms, the support vector
regression algorithm was the most stable, robust and accurate.
• Because it takes the easy-to-obtain “three parameters” as input data and retains the gas content
characteristics of core data, it also has strong generalization ability and easy migration advantages,
which can be easily extended to gas content prediction of three-dimensional shale reservoirs based
on seismic inversion data.
• The prediction results of this method in the core gas content of the shale reservoir of the Wufeng
Longmaxi Formation in the southern Sichuan Basin show that compared with the traditional
method, the gas content prediction accuracy based on the machine learning algorithm was higher.
Therefore, it can provide method support for shale reservoir target optimization and drilling
deployment.
Studying the direction of hydraulic fracture in carbonate reservoirs: Using machine
learning to determine reservoir pressure (Ref-5)
• Hydraulic fracturing (HF) is an effective way to intensify oil production, which is
currently widely used in various conditions, including complex carbonate reservoirs.
• In the conditions of the field under consideration, hydraulic fracturing leads to a
significant differentiation of technological efficiency indicators, which makes it
expedient to study the patterns of crack formation in detail. Studies were carried out
for all wells, which were considered the objects of impact, to assess the spatial
orientation of the cracks formed.
• The developed indirect method was used for this purpose, the reliability of which was
confirmed by geophysical methods.
• During the analysis, it was found that in all cases, the crack is oriented in the direction
of the section of the development system element characterized by the
maximum reservoir pressure. At the same time, the reservoir pressure values for all
wells were determined at one point in time (at the beginning of HF) using machine
learning methods.
• The reliability of the machine learning methods used is confirmed by the high
convergence with the actual (historical) reservoir pressures obtained during
hydrodynamic studies of wells. The obtained conclusion about the influence of the
reservoir pressure on the patterns of fracture formation should be taken into account
when planning hydraulic fracturing under the conditions studied.
Machine-learning-assisted high-temperature reservoir thermal energy storage
optimization (Ref-5)
• High-temperature reservoir thermal energy storage (HT-RTES) has the potential to become an
indispensable component in achieving the goal of a net-zero carbon economy, given its
capability to balance the intermittent nature of renewable energy generation.
• In this study, a machine-learning-assisted computational framework is presented to identify
HT-RTES sites with optimal performance metrics by combining physics-based simulation
with stochastic hydrogeologic formation and thermal energy storage operation
parameters, artificial neural network regression of the simulation data, and genetic algorithm-
enabled multi-objective optimization.
• A doublet well configuration with a layered (aquitard-aquifer-aquitard) generic reservoir is
simulated for cases of continuous operation and seasonal-cycle operation scenarios.
• Neural network-based surrogate models are developed for the two scenarios and applied to
generate the Pareto fronts of the HT-RTES performance for four potential HT-RTES sites.
• The developed Pareto optimal solutions indicate the performance of HT-RTES is operation-
scenario (i.e., fluid cycle) and reservoir-site dependent and the performance metrics have
competing effects for a given site and a given fluid cycle. The developed neural network
models can be applied to identify suitable sites for HT-RTES, and the proposed framework
sheds light on the design of resilient HT-RTES systems.
•
Application of machine learning to predict CO2 trapping performance in deep
saline aquifers (Ref-7)
• Deep saline formations are considered potential sites for geological carbon storage. To
better understand the CO2 trapping mechanism in saline aquifers, it is necessary to
develop robust tools to evaluate CO2 trapping efficiency.
• This paper introduces the application of Gaussian process regression (GPR), support
vector machine (SVM), and random forest (RF) to predict CO2 trapping efficiency in
saline formations.
• First, the uncertainty variables, including geologic parameters, petrophysical properties,
and other physical characteristics data, were utilized to create a training dataset.
• In total, 101 reservoir simulations were then performed, and residual trapping, solubility
trapping, and cumulative CO2 injection were analyzed.
• The predicted results indicated that three machine learning (ML) models that evaluate
performance from high to low (GPR, SVM, and RF) can be selected to predict the
CO2 trapping efficiency in deep saline formations.
• The GPR model had an excellent CO2 trapping prediction efficiency with the highest
correlation factor (R2 = 0.992) and the lowest root mean square error (RMSE = 0.00491).
Also, the predictive models obtained good agreement between the simulated field and
predicted trapping index. These findings indicate that the GPR ML models can support
the numerical simulation as a robust predictive tool for estimating the performance of
CO2 trapping in the subsurface.
Machine Learning-Assisted Prediction of Oil Production and CO2 Storage Effect
in CO2-Water-Alternating-Gas Injection (CO2-WAG) (Ref-8)
• In recent years, CO2 flooding has emerged as an efficient method for improving oil recovery. It also has the
advantage of storing CO2 underground. As one of the promising types of CO2 enhanced oil recovery (CO2-EOR),
CO2 water-alternating-gas injection (CO2-WAG) can suppress CO2 fingering and early breakthrough problems that
occur during oil recovery by CO2 flooding.
• However, the evaluation of CO2-WAG is strongly dependent on the injection parameters, which in turn renders
numerical simulations computationally expensive. So, in this work, machine learning is used to help predict how
well CO2-WAG will work when different injection parameters are used.
• A total of 216 models were built by using CMG numerical simulation software to represent CO2-WAG development
scenarios of various injection parameters where 70% of them were used as training sets and 30% as testing sets.
• A random forest regression algorithm was used to predict CO2-WAG performance in terms of oil production, CO2
storage amount, and CO2 storage efficiency. The CO2-WAG period, CO2 injection rate, and water–gas ratio were
chosen as the three main characteristics of injection parameters.
• The prediction results showed that the predicted value of the test set was very close to the true value. The average
absolute prediction deviations of cumulative oil production, CO2 storage amount, and CO2 storage efficiency were
1.10%, 3.04%, and 2.24%, respectively.
• Furthermore, it only takes about 10 s to predict the results of all 216 scenarios by using machine learning methods,
while the CMG simulation method spends about 108 min. It demonstrated that the proposed machine-learning
method can rapidly predict CO2-WAG performance with high accuracy and high computational efficiency under
conditions of various injection parameters.
• This work gives more insights into the optimization of the injection parameters for CO2-EOR.
Prediction of coal wettability using machine learning for the
application of CO2 sequestration (Ref-9)
• Carbon capture, utilization, and storage (CCUS) is an essential greenhouse gas-reducing technology that can be employed
throughout the energy system. Carbon dioxide (CO2) sequestration in underground stratas is one of the effecient ways of
reducing carbon emissions. CO2 sequestration in coal formations can be used to improve the methane recovery from coal
formations (ECBM).
• The efficiency of this process highly depend on the wettability of the coal in contact with CO2. Different experimental
methods including contact angle (CA) measurments can be used to estimate the wettability.
• However, the experimental techniques are expensive, incosistant, and time-consuming. Therefore, this study introduces the
application of artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) to estimate the CA in
coal–water–CO2 system.
• ANN and ANFIS techniques were built using 250 point dataset to calculate the contact angle of coal formation. The input
parameters were the coal properties, operating pressure, and temperature. 70% of the data set was used to train the model,
while 30% of the data was used for the testing process.
• The models were then validated with a set of unseen data. The results showed that ANN and ANFIS models accurately
predicted the contact angle in the coal–water–CO2 system as a function of coal properties and the operating conditions.
• The correlation coefficient (R) and the average absolute percent error (AAPE) between the actual and estimated contact
angle were used as indicators for the model performance.
• ANN and ANFIS models predicted the contact angle with R values higher than 0.96 for the different datasets. AAPE was
less than 7% in both models for the training and testing datasets. An empirical equation was built using the weight and
biases from the developed ANN model. The new equation was validated with the unseen data set and the R-value was
found to be higher than 0.96 with an AAPE less than 6%.these results confirm the reliability of the proposed models to get
the contact angle in the coal formation without laboratory work or complex calculations. These models can be used to
screen the coal formation targets for carbon storage.
References
1. Fatima, Z., Kumar, A., Bhargava, L. and Saxena, A., 2019. Crude oil consumption forecasting using classical and
machine learning methods. International Journal of Knowledge-Based Computer Systems, 7(1), pp.10-18.
2. Jahanshahi, H., Uzun, S., Kaçar, S., Yao, Q. and Alassafi, M.O., 2022. Artificial intelligence-based prediction of crude
oil prices using multiple features under the effect of Russia–Ukraine war and COVID-19
pandemic. Mathematics, 10(22), p.4361.
3. Prochnow, S.J., Raterman, N.S., Swenberg, M., Reddy, L., Smith, I., Romanyuk, M. and Fernandez, T., 2022. A
subsurface machine learning approach at hydrocarbon production recovery & resource estimates for unconventional
reservoir systems: Making subsurface predictions from multimensional data analysis. Journal of Petroleum Science and
Engineering, 215, p.110598.
4. Luo, S., Xu, T. and Wei, S., 2022. Prediction method and application of shale reservoirs core gas content based on
machine learning. Journal of Applied Geophysics, 204, p.104741.
5. Martyushev, D.A., Ponomareva, I.N. and Filippov, E.V., 2023. Studying the direction of hydraulic fracture in carbonate
reservoirs: Using machine learning to determine reservoir pressure. Petroleum Research, 8(2), pp.226-233.
6. Jin, W., Atkinson, T.A., Doughty, C., Neupane, G., Spycher, N., McLing, T.L., Dobson, P.F., Smith, R. and Podgorney,
R., 2022. Machine-learning-assisted high-temperature reservoir thermal energy storage optimization. Renewable
Energy, 197, pp.384-397.
7. Thanh, H.V. and Lee, K.K., 2022. Application of machine learning to predict CO2 trapping performance in deep saline
aquifers. Energy, 239, p.122457.
8. Li, H., Gong, C., Liu, S., Xu, J. and Imani, G., 2022. Machine learning-assisted prediction of oil production and CO2
storage effect in CO2-water-alternating-gas injection (CO2-WAG). Applied Sciences, 12(21), p.10958.
9. Ibrahim, A.F., 2022. Prediction of coal wettability using machine learning for the application of CO2
sequestration. International Journal of Greenhouse Gas Control, 118, p.103670.
THANK YOU

02 Introduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 Introduction

Uploaded by

Copyright:

Available Formats

Overview of Application of Python and

Data Analysis in Petroleum Engineering

Various exploratory data analysis methods like:

• Descriptive Statistics, which is a way of giving a brief overview of the

You might also like