Improving Permeability Prediction Via Machine Learning

Environmental Earth Sciences (2024) 83:244
https://doi.org/10.1007/s12665-024-11534-0
ORIGINAL ARTICLE
Improving permeability prediction via Machine Learning

in a heterogeneous carbonate reservoir: application to Middle
Miocene Nullipore, Ras Fanar field, Gulf of Suez, Egypt
Mostafa S. Khalid1 · Ahmed S. Mansour2 · Saad El‑Din M. Desouky1 · Walaa S. M. Afify3 · Sayed F. Ahmed4 ·
Osama M. Elnaggar1
Received: 9 October 2023 / Accepted: 3 March 2024

© The Author(s) 2024
Abstract
Predicting and interpolating the permeability between wells to obtain the 3D distribution is a challenging mission in reser-
voir simulation. The high degree of heterogeneity and diagenesis in the Nullipore carbonate reservoir provide a significant
obstacle to accurate prediction. Moreover, intricate relationships between core and well logging data exist in the reservoir.
This study presents a novel approach based on Machine Learning (ML) to overcome such difficulties and build a robust
permeability predictive model. The main objective of this study is to develop an ML-based permeability prediction approach
to predict permeability logs and populate the predicted logs to obtain the 3D permeability distribution of the reservoir. The
methodology involves grouping the reservoir cored intervals into flow units (FUs), each of which has distinct petrophysical
characteristics. The probability density function is used to investigate the relationships between the well logs and FUs to
select high-weighted input features for reliable model prediction. Five ML algorithms, including Linear Regression (LR),
Polynomial Regression (PR), Support Vector Regression (SVR), Decision Trees (DeT), and Random Forests (RF), have been
implemented to integrate the core permeability with the influential well logs to predict permeability. The dataset is randomly
split into training and testing sets to evaluate the performance of the developed models. The models’ hyperparameters were
tuned to improve the model’s prediction performance. To predict permeability logs, two key wells containing the whole res-
ervoir FUs are used to train the most accurate ML model, and other wells to test the performance. Results indicate that the RF
model outperforms all other ML models and offers the most accurate results, where the adjusted coefficient of determination
(R2adj) between the predicted permeability and core permeability is 0.87 for the training set and 0.82 for the testing set, mean
absolute error and mean squared error (MSE) are 0.32 and 0.19, respectively, for both sets. It was observed that the RF model
exhibits high prediction performance when it is trained on wells containing the whole reservoir FUs. This approach aids in
detecting patterns between the well logs and permeability along the profile of wells and capturing the wide permeability
distribution of the reservoir. Ultimately, the predicted permeability logs were populated via the Gaussian Random Function
Simulation geostatistical method to build a 3D permeability distribution for the reservoir. The study outcomes will aid users
of ML to make informed choices on the appropriate ML algorithms to use in carbonate reservoir characterization for more
accurate permeability predictions and better decision-making with limited available data.
Keywords Permeability prediction · Nullipore carbonate reservoir · Heterogeneity · Flow units · Machine learning ·
Hyperparameters tuning
Abbreviations
* Mostafa S. Khalid AI Artificial intelligence
mostafashafik579@gmail.com ANN Artificial neural networks
1
Production Department, Egyptian Petroleum Research
D Density
Institute, Cairo, Egypt DeT Decision trees
2
Faculty of Science, Alexandria University, Alexandria, Egypt
DT Delay time recorded by sonic log
3
FZI Flow zone indicator
Faculty of Education, Alexandria University, Alexandria,
Egypt
GRFS Gaussian Random Function Simulation
4
HFUs Hydraulic flow units
Rashpetco, Cairo, Egypt
Vol.:(0123456789)
244 Page 2 of 20 Environmental Earth Sciences (2024) 83:244
LR Linear regression Wendt et al. (1986), Balan et al. (1995), and Xue et al.
MAE Mean absolute error (1997) reported that integrating permeability with logs other
ML Machine learning than porosity and using multiple linear regression analysis
MSE Mean squared error increases the correlation between measured and predicted
Pdf Probability density function permeability. This approach assumes a linear relationship
PR Polynomial regression between permeability and influential well logs. However,
phiE Effective porosity this is not the case in many reservoirs, especially in hetero-
RF Random forests geneous ones, where non-linear and complex relationships
RQI Reservoir quality index exist between well logs and permeability.
SVR Support vector regression To overcome such challenges, Machine Learning (ML)
Vminerals Volume of minerals has been extensively introduced and tested as a regression
tool for the prediction of reservoir permeability from well
Symbols
logs (Akande et al. 2017; Zhu et al. 2017; Elkatatny et al.
Fs Pore shape factor
2018; Okon et al. 2021; Male et al. 2020; Lv et al. 2021;
LK Logarithm of permeability
Kamali et al. 2022; Abbas et al. 2023; Matinkia et al. 2023;
R Pearson coefficient of correlation
Mahdy et al. 2024).
R2adj Adjusted coefficient of determination
In some reservoirs, categorizing the reservoir into
mright/left Number of instances in the right/left subset of
hydraulic flow units (HFUs), where each unit has geo-
Decision Tree
logical and petrophysical characteristics different from the
Svgr Specific area per unit grain volume
other, improves the permeability prediction. Amaefule et al.
T Tortuosity
(1993) presented the Flow Zone Indicator (FZI) concept to
∆t Interval transit time, µs/ft
group the reservoir into HFUs, where each unit has a similar
Φ Porosity, fraction
FZI. Several researchers integrated the FZI with well logs
Φz Normalized porosity
via multiple regression analysis, artificial neural networks
(ANN), and adaptive neuro-fuzzy inference system to esti-
mate the FZI from logs, thereafter computed permeability
Introduction from FZI (Aminian et al.; 2003; Kharrat et al. 2009; Khalid
et al. 2020; Alizadeh et al. 2022; Djebbas et al. 2023).
Permeability is a crucial petrophysical parameter for res- Although ML-based models have been extensively used
ervoir simulation. It is essential for selecting the optimum for permeability prediction, there are significant challenges
development scenarios. It can be measured via core sam- and shortcomings in the application of these models to het-
pling or pressure testing methods. However, these meth- erogeneous reservoirs. Most ML studies did not present a
ods are limited, costly, and time-consuming. Thus, several systematic approach to predict permeability in complex
researchers tried integrating permeability with well logs carbonate reservoirs along the wells profile to be further
to obtain a continuous permeability profile along the res- populated in 3D. In the present study, a systematic novel
ervoir. Kozeny (1927) and Carman (1937) created the per- approach based on Machine Learning is presented to pre-
meability correlation with formation porosity. This model dict the permeability logs in the Nullipore heterogeneous
assumes that pores are cylindrical. However, pore shape carbonate reservoir in Ras Fanar field using core and well
can vary from one unit to another within the reservoir. logs data. In this approach, the reservoir is divided into dif-
Moreover, the model ignores the lack of a basic relation- ferent HFUs. Five Machine Learning algorithms, namely
ship between porosity and permeability; as zones with Linear Regression (LR), Polynomial Regression (PR), Sup-
equal porosity but different permeability exist in the same port Vector Regression (SVR), Decision Trees (DeT), and
reservoir. Besides, some carbonate reservoirs constitute Random Forests (RF), were applied to integrate the core per-
low porosity and high permeability due to fractures. Both meability data with well logging ones and their performance
depositional factors—such as pore geometry, grain size, was evaluated. The most accurate algorithm was applied to
tortuosity-, and diagenetic ones—as cementation, dissolu- predict permeability logs along the profile of wells. Contrary
tion, and fracturing-affect permeability. Subsequently, a to the existing models in the literature, the new approach
more reliable approach was needed to take into account presented in this study is applied to two key wells containing
the fundamentals of geology and flow physics in porous all the reservoir HFUs to train the algorithm. Other wells are
media. used as blind test wells for model validation. This approach
Environmental Earth Sciences (2024) 83:244 Page 3 of 20 244
enables the model to detect the patterns between the input located on the offshore western side of the central province
and output data for the whole permeability variation range, of the Gulf of Suez (G.O.S), Egypt (Fig. 1). The field lies
hence capturing the permeability heterogeneity of the reser- 3.5 km east of Ras Gharib shoreline in the Belayim dip prov-
voir. The study aims to (a) identify the reservoir HFUs from ince of northeast direction; between latitudes 28° 13ʹ and 28°
core data, (b) analyze the quality of the reservoir in terms of 18ʹ to the north, and longitudes 33° 11ʹ and 33° 17ʹ to the
storage and flow capacities, and (c) develop an ML-based east. The Middle Miocene Nullipore reservoir is the main
permeability prediction approach to predict permeability oil-producing unit in the field. It is equivalent to the Ham-
logs along the wells profile and populate the predicted logs mam Faraun Member of the Belayim Formation. Belayim
to obtain 3D permeability distribution of the reservoir. Fur- formation represents a part of the Middle Miocene syn-rift
thermore, the study provides a reference for predicting per- succession in the Ras Fanar field (Moustafa 1976).
meability in complex carbonate reservoirs, which can be fol- Three depositional sequences characterize the rift stratig-
lowed in other areas that have similar geological conditions. raphy of G.O.S, as follows: Pre-rift sequence (Paleozoic to
Late Eocene), syn-rift (Early to Late Miocene), and post-rift
Location and geology of the study area (Post-Miocene). The Ras Fanar field was produced from an
NW–SE structural trap bounded by a major fault system to
The most prolific hydrocarbon province in Egypt is the Gulf the SW and tilted to NE. The Ras Fanar area sedimentary
of Suez rift basin. It has 80% of oil reserves and gives 75% succession ranges in age from Paleozoic to recent, as shown
of oil production in Egypt. Ras Fanar field is an oil field in Fig. 2. The syn-rift succession is represented by Belayim
Fig. 1 Location map of Ras Fanar field

Fig. 2 Lithostratigraphic column of Ras Fanar field. Modified after (El Naggar 1988)
formation at the base, South Gharib formation in the mid- the G.O.S region, with frequent occurrence of reefs, corals,
dle, and Zeit formation at the top (Souaya 1965; Hosny et al. and algae.
1986; Rateb 1988). As a result of the rifting of the Suez Gulf, several
The Nullipore represents the main carbonate reservoir stratigraphic successions were developed on both sides
unit in the Ras Fanar field. It contributes to about half of the of the Gulf. Moreover, many grabens, half-grabens, and
field production. Thiébaud and Robson (1979) introduced horst blocks were created. Ras Fanar was a positive high-
the Nullipore reservoir as bioclastic Limestone exposed at land area during the Early Miocene and was subjected to
severe erosion. In the adjacent lowland troughs, a thick Materials and methodology
organic-rich shale sequence was deposited. The area was
submerged by a shallow sea during the Early Middle Mio- Four wells drilled in the Ras Fanar field have been assessed
cene. This allows the accumulation of thick algal-reefal in this study. The wells targeted the Middle Miocene Nul-
carbonate facies of the Nullipore reservoir. More arid con- lipore reservoir. Routine core analysis (RCA) was performed
ditions exist by the end of the Middle Miocene, resulting on 794 core plug samples from the four wells. RCA involves
in vertical and lateral facies changes from carbonates to porosity acquired by a helium porosimeter and horizontal
alternative cycles of evaporates, siliciclastics, and car- permeability measured by a permeameter. Conventional well
bonates of South Gharib and Zeit formations (Moustafa logs, comprising gamma-ray (GR), neutron (NPHI), density
1977; Thiébaud and Robson 1979; Chowdhary and Taha (RHOB), and compressional slowness data in every 0.5 ft.
1986; El Naggar 1988; Ouda and Masoud 1993; Khalil are available from the wells.
and McClay 2001).
Select wells that have cored and logged

reservoir intervals with similar litho-facies
Select core porosity and permeability data
Group the reservoir into HFUs via the FZI approach
Select well logs that affect the permeability of each Import data to Python
HFU significantly via PDF
Group data into input features (influential

logs) and target (core permeability)
Random data split into training and testing
Train the ML algorithms with cross validation

and hyperparameters’ tuning
High R2adj, low MAE, and MSE
Test data validation
No
High R2adj, low MAE, and MSE
Yes
Train the model on data of wells containing the

reservoir HFUs
Check the trained model accuracy by predicting permeability

in the blind test wells
Predicting permeability logs for the logged intervals of the studied wells
Populate the permeability logs via geostatistics to create the 3D permeability distribution
Fig. 3 Flow chart of the proposed methodology to predict and distribute permeability logs
The following steps were followed to achieve the aim of 1 Φ3

the study: (1) reservoir litho-facies were determined at the K = 1014.24
Fs T 2 Svgr
2 (1)
(1 − Φ)2
four wells, (2) wells that have cored and logged reservoir
intervals with similar litho-facies were selected, (3) Avail- where K is the permeability, md, Fs is the pore shape factor,
able cores from the selected wells were analyzed to identify T is the tortuosity of the path of flow, Svgr is the specific area
the hydraulic flow units (HFUs) in the reservoir, relying on per unit grain volume, and Φ is the effective porosity, frac-
the concept of flow zone indicator (FZI), (4) Core-log depth tion (Kozeny 1927; Carman 1937).
match was performed, (5) well logs, including gamma-ray, Since it is difficult to determine Fs , Svgr , and T, Amaefule
resistivity, Neutron, bulk density, sonic, were analyzed at the et al. (1993) defined the FZI parameter as the square root of
selected wells, (6) Five logs were initially selected as input ( F .T12 .S2 ) and developed an equation to calculate FZI from
features to build the permeability model. The logs include s vgr
core data, as follows:

sonic, density, neutron, effective porosity, and volume of
( )
minerals logs. The probability density function (PDF) was FZI = RQI∕Φz (2)
used to investigate the relationships between the selected
logs and FUs, (7) the influential logs and the reservoir FUs √
were integrated with permeability via five Machine Learning RQI = 0.0314 ∗ K∕Φ (3)
algorithms to predict permeability from the logs.
The dataset was split into training and testing sets to eval- Φz = Φ∕(1 − Φ) (4)
uate the performance of each model via three evaluation
metrics: mean absolute error (MAE), mean squared error where RQI is the reservoir quality index, µm, and Φz is the
(MSE), and adjusted coefficient of determination (R2adj). normalized porosity.
Hyperparameters of the models were tuned to choose the The calculated FZI is used to group the reservoir into
optimal values of parameters that improve the models' per- different FUs, through the following formula:
formance and achieve high accuracy. The most accurate Log RQI = log FZI + log Φz (5)
model was selected to predict the permeability logs of the
studied wells. The data of two wells containing the whole The base of categorizing the reservoir into flow units is
reservoir FUs were used to train the model. The model accu- identifying clusters that achieve unit slope straight lines on
racy was checked by using other wells as blind test wells. the plot (log RQI vs. log Φz ), where each cluster has unique
The developed model was used to predict the permeability in geological and petrophysical characteristics (porosity and
logged un-cored intervals. The predicted permeability logs permeability) (Tiab and Donaldson 2015).
were populated via geostatistics to create the 3D distribution
of reservoir permeability. Figure 3 indicates the workflow
Machine learning
used in this study for predicting permeability logs via ML.
In this study, we aim to integrate the core permeability of
Hydraulic flow units
each FU with the well logs via Machine Learning to predict
permeability logs along the wells profile.
Petroleum geologists and engineers have recognized the
Machine Learning (ML) evolved as a subfield of artifi-
need to define geological/engineering units to shape the
cial intelligence (AI), including self-learning algorithms
description of reservoir zones as storage containers and con-
that derive knowledge from data, instead of requiring
duits for fluid flow. Depositional and diagenetic processes
humans to manually derive rules and build models, to
result in the formation of different flow units in the reservoir.
improve the performance of predictive models, and make
A flow unit (FU) was defined by Bear (1988) as a repre-
data-driven decisions. ML is considered the cornerstone in
sentative reservoir volume that has the same geological and
the new era of big data. It has been successfully applied in
petrophysical characteristics. Hearn et al. (1984) defined the
the Geoscience field to predict different reservoir proper-
FU as the reservoir portion that is continuous laterally and
ties (Raschka and Mirjalili 2019; Al Khalifah et al. 2020;
vertically and has similar bedding characteristics, porosity,
Alizadeh et al. 2022).
and permeability.
This study focuses on a specific field of ML called pre-
To identify the trends between porosity and permeability,
dictive modeling of a continuous variable (permeability).
Amaefule et al. (1993) presented the concept of Flow Zone
Predictive modeling focuses on developing models that cre-
Indicator (FZI). The modified form of the Kozeny-Carman
ate accurate predictions at the expense of explaining why
equation for estimating permeability is given by:
the predictions are made. Five Machine Learning (ML) error is acceptable in the model. The error term is handled
algorithms were implemented via Python programming in the constraints, where the absolute error is set less than
language. The algorithms involve Linear Regression (LR), or equal to a specified margin, named the maximum error
Polynomial Regression (PR), Support Vector Regression (epsilon, ɛ). The epsilon must be tuned to gain the desired
(SVR), Decision Trees (DeT), and Random Forests (RF). accuracy of the model. The “C” hyperparameter controls
the balance between samples in the decision boundary and
Linear regression margin violations (outliers).
To tackle non-linear problems, SVR maps the input fea-
Linear regression is a statistical method used to predict the tures into a higher dimension space through a kernel trick.
value of a response (dependent variable) from known values The kernel is a function that transforms the non-linear pat-
of one or more independent variables (regressors). Transfor- tern to a linear one in a higher dimension space. Polynomial,
mation of data may be required to achieve a better fit between sigmoid, and radial basis function (RBF) kernels can achieve
variables. The general form of the equation is: this task. One can select the kernel function type according
to the trends between the input features and the target vari-
Y = bo + Σ bn. Xn (6) able. In this study, linear, polynomial, and RBF kernels were
where Y is the dependent variable, Xn is the independent used sequentially, and the one that achieves the highest accu-
variable/s, bo is the intercept, and b’s are the regression racy was selected to develop the SVR permeability model.
coefficients. Since SVR relies on distances between data points, it is
In this study, core permeability is the dependent variable, beneficial to scale the input features to ensure that they fit
while the FU and influential well logs are the independent into the same range. In this study, the features are stand-
variables. ardized by removing the mean and scaling to unit variance
(features range from − 1 to 1) (Géron 2022).
Polynomial regression
If the data of the model is more complex than a linear Decision trees
straight line, the algorithm of Polynomial Regression (PR)
can be effective to fit the non-linearity. The algorithm Decision trees (DeT) is a ML algorithm that can fit complex
involves adding powers of each feature as new features and datasets, and perform classification and regression tasks.
thereafter trains a linear model on the new features. Moreo- DeT are the basic components of Random Forests. DeT
ver, PR can find relationships between features by adding aims to develop a model that predicts a target variable by
combinations of features up to the given degree. For exam- learning simple decision rules inferred from data features.
ple, if the model has two input features a and b, Polynomial The algorithm splits the training set into two subsets using
features with degree 3 will not only add a2, a3, b2, and b3, but a single feature (k) and a threshold (tk). The pair (k, tk) that
also the combination ab, a2b, and ab2 (Géron 2022). produces the least mean squared error (MSE) is considered
the best split. The prediction is the average target value of
Support vector regression the training instances associated with the leaf node. The cost
function that the algorithm tries to minimize is given by:
One of the versatile Machine Learning algorithms that are ( ) m mright
capable of performing linear or nonlinear classification and J k, tk = left MSEleft + MSEright (7)
m m
regression issues is the Support Vector Machine (SVM).
SVM is particularly well suited for complex but small- to where “mright/left” is the number of instances in the right/left
medium-sized datasets for classification issues (Géron subset.
2022). The tree starts at the root node (depth 0 at the top): this
In this article, we are concerned with Support Vector node assumes certain condition. In other words, the node
Regression (SVR) to predict permeability. SVR differs from asks whether a feature is smaller than a certain value. If the
Linear Regression (LR) in searching for a hyperplane that data sample satisfies this condition, it will move down to the
best fits the data points in a continuous space, instead of fit- root’s left child node (depth 1, left) and so on till it reaches
ting a line to the data points. Besides, in contradiction with the leaf node (last node in the tree on the left side). On the
LR which aims to minimize the sum of squared errors, the other hand, if the sample is greater than the value of the root
objective function of SVR is to minimize the coefficients node, it will move to the root’s right child node (depth 1,
of the variables and give the flexibility to define how much right), and so on till it reaches the leaf node (Géron 2022).
Particularly, DeT does not require feature scaling or Results and discussion
centering at all. Moreover, the trees do not assume the
linearity of data. The tree adapts itself to the training Core data
data. To avoid overfitting, hyperparameters of the tree
must be regularized, particularly its depth and the mini- Nullipore reservoir penetrated by four wells in Ras Fanar
mum number of samples that can be split at each node field is mainly dolomitic Limestone with considerable
(Géron 2022). amounts of Anhydrite and minor intercalations of Shale.
Random forests
A Random Forest (RF) is an ensemble of decision trees.

The concept of RF is to average multiple decision trees
that individually suffer from high variance to build a more
robust model that has a better generalization performance.
The algorithm can be summarized in four steps: choose “n”
samples from the training data randomly, grow a decision
tree from the selected samples via selecting some features
randomly and splitting the node using the feature that pro-
vides the best split, repeat steps 1 & 2 k-times, and finally,
aggregate the prediction of each tree and take the average
value (Raschka and Mirjalili 2019).
Fig. 4 A plot of core porosity vs. logarithm of core permeability of

the four wells
Cross‑validation
The accuracy and prediction performance of ML models

Table 1 Descriptive statistics Core parameters Phi LK
can be assessed using various techniques of cross-valida- of core porosity and logarithm
tion, such as random subsampling and K-fold cross-vali- of permeability of the studied No. of samples 794 794
dation. Random subsampling is carried out by splitting the wells
Mean 0.213 1.377
original dataset into two parts: training and testing sets. Median 0.222 1.613
The ML algorithm is trained on the first part, makes pre- Mode 0.27 − 1.7
dictions on the second part and evaluates the predictions Std. deviation 0.091 1.255
against the expected results. This prevents the problem Variance 0.008 1.575
of overfitting, assures the external prediction and gives Minimum 0.009 −2
more trust for the prediction given different datasets with Maximum 0.44 3.98
the same parameters. K-fold cross-validation is adopted
by partitioning the training dataset into k-folds (e.g. k = 5,
or k = 10). Each partition is called a fold. “k−1” folds are
used for training the algorithm and one is held back for
model validation. This procedure is repeated so that each
fold is given a chance to be used for testing. The dataset
size controls the number of folds. A small number of folds
leads to large bias and small variance with reduced com-
putation time. On the other hand, a large number of folds
leads to large variance and small bias with large com-
putation time. Using tenfold cross-validation is a com-
mon choice. The performance measure is then the average
of the values computed in the loop (Al-Mudhafar 2016;
Brownlee 2016).
Fig. 5 A plot of log normalized porosity vs. log RQI of the core data
from the other. Hence, grouping the reservoir into different

FUs is essential. The approach of FZI was applied. Figure 5
shows a plot of log phiz vs. log RQI of the core data. The
figure indicates the existence of two FUs, where each unit
has a unit slope and mean FZI (0.51 and 2.42, respectively).
This reflects that each unit has fluid flow properties different
from the other.
A plot of porosity vs. log of permeability for each FU
is shown in Fig. 6. It is evident from the figure that the
porosity–permeability correlation is improved relative to
Fig. 4, where R2 is 0.74 for the first FU and 0.58 for the
second one.
The range of FZI, average value of FZI, porosity, and
Fig. 6 A plot of porosity vs. log of permeability for each FU permeability of each unit are summarized in Table 2. The
table indicates the notable difference in average perme-
ability between the two FUs, hence the difference in their
Table 2 Average values of FZI, porosity, and permeability of the FUs fluid flow capacity, where the average permeability of the
first FU is 0.19, whereas that of the second FU is 1.96.
FU 1 2
This means that the second FU with a higher FZI (2.42)
FZI range 0.07–0.99 0.99–18.23
has better reservoir quality and percolation capacity than
Avg. FZI 0.52 2.88
the first FU (FZI = 0.51).
Avg. porosity 0.2 0.23
Avg. permeability 0.19 1.96
Sensitivity analysis of well logs
A plot of core porosity vs. logarithm of permeability (LK) Five well logs were initially selected to create the perme-
of the cored intervals of the four wells is shown in Fig. 4. ability model, as follows: sonic log, bulk density (RHOB),
Descriptive statistics of core porosity and permeability are neutron porosity log (NPHI), effective porosity (phiE), and
summarized in Table 1. The figure indicates a very poor cor- volume of minerals log (Vminerals). It must be highlighted
relation (R2 = 0.4286) with high a degree of scatter and sam- that the effective porosity log was developed by combining
ples of the same porosity but different permeability. This is a the three porosity logs; neutron, density, and sonic. Besides,
result of the existence of more than one FU in the reservoir, the volume of minerals log was created from the density log.
where each unit has rock and fluid flow properties different
Table 3 Statistical parameters Well logs Mean Median Mode Std. deviation Variance Minimum Maximum
of well logs of the first FU
DT 79.869 77.8 68.3 18.241 332.742 39.5 139.4
D 2.436 2.408 2.408 0.177 0.031 2.065 2.921
N 0.331 0.348 0.367 0.089 0.008 0.037 0.477
phiE 0.241 0.245 0.295 0.085 0.007 0.021 0.415
Vminerals 0.746 0.707 0.686 0.158 0.025 0.478 0.996
Table 4 Statistical parameters Well logs Mean Median Mode Std. deviation Variance Minimum Maximum
of well logs of the second FU
DT 90.42 87.2 79.6 24.493 599.919 21.7 182.1
D 2.382 2.355 2.309 0.132 0.017 2.076 2.888
N 0.333 0.335 0.341 0.073 0.005 0.032 0.55
phiE 0.266 0.284 0.3 0.075 0.006 0.018 0.44
Vminerals 0.728 0.712 0.68 0.145 0.021 0.327 0.981
Fig. 7 The probability density functions of the selected logs for the reservoir FUs a sonic log, b density log, c neutron porosity, d effective
porosity log, and e volume of minerals log
Shale volume is not considered because its volume is very the other hand, if the curves are overlapping and forming one
low in the reservoir, hence is discarded. On the other hand, cluster with almost the same mean, the log is not considered
sonic, neutron, and bulk density logs are raw logs. The sta- a good regressor, hence discarded from the model.
tistical parameters of the logs of the two FUs are shown in The density functions of the sonic log are shown in
Tables 3 and 4, respectively. Fig. 7a. The second curve (FU 2) has a wider range of inter-
The probability density function (PDF) was used to inves- val transit time than the first curve. This is a result of the
tigate the relationships between the selected logs and reser- higher porosity and permeability of the second flow unit
voir FUs. PDF of each log is plotted and compared. If the than the first one. Therefore, the sonic log is a good regres-
two curves are distinctly separated, the log is considered a sor and can be used in the model.
good regressor and can be used in the predictive model. On
Figure 7b shows the density functions of the density log. imported as a feature to get one model for the whole reser-
The two curves are clearly separated, with different means, voir, instead of getting a model for each flow unit. Figure 8
where the second curve has a lower density (due to higher shows a heat map of the well logs and LK for the two FUs
porosity and permeability) than the second curve. Conse- of the reservoir based on the Pearson correlation coefficient
quently, the density log is good a regressor and can be used (R). The figure indicates that phiE and DT logs have high
in the model. positive R with LK for the two flow units (R = 0.73, 0.74,
The density functions of the neutron log of the two flow 0.63, and 0.7, respectively) for the two flow units, while
units are shown in Fig. 7c. The two curves of the two flow D has a high negative correlation (R = − 0.62, and − 0.63,
units are very similar with the same mean and are not dis- respectively).
tinctly separated, hence no relation exists between neu- The dataset was randomly split into 80% training (583
tron porosity and permeability as the log gives different data samples) and 20% testing sets (145 data samples). The
responses to the same permeability. So, the neutron log is random generator seed was set to “60” to better compare
discarded from the model. the models.
The density functions of the effective porosity log are In the case of SVR, DeT, and RF models, a grid search
shown in Fig. 7d. The two curves are separated. The first (LaValle et al. 2004) was conducted to select the optimum
curve has a lower mean value of porosity than the second hyperparameters that provide better performance. The grid
curve. Subsequently, the effective porosity log is a good search function searches for various values of the model
regressor and can be used in the model. hyperparameter/s. The range of hyperparameters that have
The volume of minerals log involves the volumes of cal- been tuned and that achieved the highest accuracy are shown
cite, dolomite, and anhydrite (the main minerals in the res- in Table 5.
ervoir). The density functions of the minerals log are shown
in Fig. 7e. The two curves are similar and not distinctly sepa-
rated, hence the log is discarded from the model.
According to the PDF, only sonic, density, and effective Table 5 Ranges of hyperparameters that have been tuned, and best
porosity logs can determine the reservoir FUs and can be hyperparameters of SVR, DeT, and RF models
used for building the permeability model. Model Hyperparameter Hyperparameter Best
search range hyperpa-
rameters
Core‑logs data integration
SVR Epsilon 0.001–1 0.001
Core permeability and the influential logs data were C 1–200 100
imported into Python. The dataset was grouped into input DeT Maximum depth (2–10) 5
features and target variable. FU, Sonic, Density (D), and Minimum samples split (5–50) 25
effective porosity (phiE) logs are the input features, whereas RF Number of estimators (2–100) 50
core permeability (LK) is the target variable. FU was Maximum depth (2–10) 6
Fig. 8 Heat map of well logs

and permeability of the reser-
voir flow units a heat map of
FU 1, and b heat map of FU 2
Fig. 9 Comparison of performance of kernel tricks of SVR of training and testing sets
Fig. 10 A plot of the logarithm of predicted permeability and measured core permeability of the five ML models for the training sets
Evaluation of prediction accuracy of ML models Adjusted R2 is a statistical metric used to evaluate the good-
ness of fit of a regression model. It takes into account only the
One or more evaluation metrics must be used to evaluate the significant predictors that explain the variability in the data and
performance of any ML model. In this study, three evalua- improve the model performance. It is be expressed by:
tion metrics were used to monitor the permeability prediction
SSE∕(n − p)
accuracy: Mean absolute error, mean squared error (MSE), R2adj = 1 − (10)
and adjusted coefficient of determination (R2adj). MAE is the SST∕(n − 1)
arithmetic average of the absolute errors between true values where SSE is the error sum of squares, n is the number of
and predicted ones, where: samples, p is the number of predictors, and SST is the total
1 ∑| sum of squares.
MAE = ̈|
|Y − Υ| (8)
n
where Y is the logarithm of core permeability (LK), ϔ is the Performance of ML models
predicted logarithm of permeability, and n is the number of
samples. In this section, the performance of the permeability predic-
MSE is the average squared difference between true and tion of LR, PR, SVR (with kernel trick that provides the
predicted values, where: best performance), DeT, and RF models are evaluated and
1 ∑( )2 compared to core permeability for both training and testing
MSE = Y − Ϋ (9) sets. Performance of the linear, polynomial, and radial basis
n
Fig. 11 A plot of the logarithm of predicted permeability and measured core permeability of the five ML models for the testing sets
Fig. 12 Validation metrics of

permeability prediction of the
five ML models for the training
and testing sets
function kernel tricks of SVR for both training and testing 45-degree line, the lower the prediction accuracy, and
sets is shown in Fig. 9. It is observed that RBF kernel trick vice versa). This can be attributed to the complex non-
outperforms the linear and polynomial tricks, where R2adj is linear relation between the well logs and the permeability
0.77, MAE is 0.34, and MSE is 0.29 for both training and that the LR cannot capture since it assumes a linear rela-
testing sets, while in case of linear trick, R2 is 0.66, MAE is tion between variables. Better correlation and lower error
0.44, and MSE is 0.37 for both sets. Results of the polyno- than the LR are obvious in the case of the SVR model,
mial trick show R2 = 0.67, MAE = 0.44, and MSE = 0.42 for where R 2adj is 0.77, MAE is 0.34, and MSE is 0.29 for
both sets. Consequently, the RBF trick is used to build the both the training and testing sets. This is attributed to the
SVR permeability model. ability of SVR to track the non-linearity between vari-
Figures 10 and 11 show the logarithm of predicted perme- ables via the radial basis function kernel trick. The per-
ability vs. the measured core permeability of the training and formance of the PR model is slightly better, where R2adj is
testing sets of the five ML models, respectively. Figure 12 0.81 for the training set and 0.79 for the testing set, MAE
indicates the evaluation metrics (R2adj, MAE, and MSE) of is 0.38, and MSE is 0.29 for both sets. PR tries to track
the training and testing sets of the five models. the non-linearity by increasing the polynomial degree of
It is observed that poor correlation and high error the input features (degree 5) in this case offers better per-
appear in the case of the LR model where R2adj is 0.69, formance than other polynomial degrees), hence correla-
MAE is 0.44, and MSE is 0.36 for both the training tion increases. The performance of the DeT model is very
and testing sets. The data samples are away from the close to that of PR, where R2adj is 0.81 for the training set,
45-degree line (the farther the data is away from the 0.78 for the testing set, MAE is 0.39, and MSE is 0.28
Fig. 13 Comparison of the loga-

rithm of predicted permeability
and core permeability of the
train wells a train well 1, and b
train well 2
for both sets. RF model provides the highest correlation Predicting permeability in the studied wells
and the least error among other ML models, where R2adj
reaches 0.87 for the training set and 0.82 for the testing Since the RF algorithm provides the most accurate predic-
set, MAE is 0.32 for both sets, and MSE is 0.19 for both tion results, it is used to predict the permeability logs in the
sets. The RF cross plot indicates that the data samples are studied wells. A new approach is presented for prediction,
closer to the 45-degree line than other models. RF pro- where two wells involving the two FUs were considered as
vides high tree diversity via searching for the best feature key wells to train and develop the RF model, and the other
among a random subset of features (not the whole subset, two wells as blind test wells to evaluate the model perfor-
as DeT does), hence increasing the chance of determin- mance. The comparison between the logarithm of predicted
ing the complex relationships between the features (well permeability and core permeability for the train wells, and
logs) and target variable (permeability), hence providing the test wells is shown in Figs. 13a, b, 14a, b, respectively.
the best accuracy among all other models. Overall, the two figures indicate a good match between the
Fig. 14 Comparison of the loga-

rithm of predicted permeability
and core permeability of the test
wells a test well 1, and b test
well 2
predicted permeability and core permeability for the four will occur. Cross-validation was adopted and the pertinent
wells. The approach of selecting wells constituting the hyperparameters were tuned for their optimized values to
reservoir FUs to train the RF model enables the model to prevent this problem. This helps to apprehend the perme-
select input and output data for the whole reservoir FUs. ability heterogeneity and develop the best performing pre-
This enables the model to detect the patterns between the dictive model.
input and output data for the whole permeability variation The developed RF model was used to predict the perme-
range, hence capturing the permeability heterogeneity of the ability logs for the logged uncored intervals of the reservoir
reservoir. for the four wells. The permeability logs of the four wells
It must be highlighted that some high permeability values are shown in Fig. 15.
are underestimated by the model since the model ignored the The wells data was imported in petrel software. The
higher values while fitting the pattern in the data, and such predicted permeability logs were upscaled and populated
performance occurs at the expense of the model’s ability to in the geological model by setting the appropriate vari-
fit the test (unseen) data. If the model fits the whole data- ogram model and applying the Gaussian Random Func-
set that has a wide range of values, the overfitting problem tion Simulation geostatistical method (GRFS). GRFS is
Fig. 15 The permeability logs of the whole logged reservoir intervals for the studied wells
a stochastic method that can produce local variation and Summary and conclusions
reproduce input histograms. It honors well data, distribu-
tions of inputs, variogram, and trends. Figure 16 indicates This study provides a systematic approach for predicting
the histogram of the K-log and upscaled log. It is evident and distributing permeability logs from conventional well
from the figure that the upscaled data honors the input data, logs via Machine Learning in the Middle Miocene Nul-
where 2.4% of data is > 1 md, 22% is between 3 and 9 md, lipore carbonate reservoir. Nullipore reservoir is highly
39% is between 30 and 90 md, 31.7% is between 300 and heterogeneous with wide permeability distribution. Cat-
900 md and 4.9% is > 2000 md. The parameters of the egorizing the cored intervals into flow units improves the
developed permeability model are summarized in Table 6. reservoir characterization and permeability prediction. In
Figure 17 shows the 3D permeability distribution of the this study, five ML algorithms were applied to integrate
reservoir. the core permeability of each flow unit with the influential
Fig. 16 Histogram of the K-log

and upscaled log
well logs, and their performance was evaluated. The mod- the same data (interior prediction). A new approach is
els involve Linear Regression (LR), Polynomial Regres- presented to predict the permeability logs in each well.
sion (PR), Support Vector Regression (SVR), Decision The approach involves training the RF model on the core
Trees (DeT), and Random Forests (RF). Cross-validation and logging data of two wells containing all the reservoir
of the dataset and hyperparameters’ tuning of the mod- flow units and using the other wells as blind test wells.
els reveals that RF is the most powerful ML model in This aids in capturing the heterogeneity of reservoir per-
tackling the complex non-linear relationships between meability. Gaussian Random Function Simulation geosta-
the influential well logs and permeability, based on three tistical method was used to populate the predicted per-
evaluation metrics (R2adj, MAE, and MSE). Splitting the meability logs to obtain 3D distribution for the reservoir
data into train and test sets allows for producing exterior permeability.
prediction given the test dataset rather than prediction on
Table 6 Parameters of the permeability model

Method Property Variogaram Major direction range Minor direction range Vertical range
Gaussian random function simulation Permeability Spherical 725 725 82.2

Fig. 17 3D distribution of permeability of Nullipore reservoir
Author contributions Mostafa Khalid: conceptualization, method- the article's Creative Commons licence and your intended use is not
ology, writing, software, programming language. Ahmed Mansour: permitted by statutory regulation or exceeds the permitted use, you will
supervision, visualization, writing—review and editing, validation. need to obtain permission directly from the copyright holder. To view a
Saad El-Din Desouky: review and editing. Walaa Afify: supervision, copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
visualization, review, and editing. Sayed Ahmed: visualization, review,
and editing. Osama Elnaggar: review and editing.
Funding Open access funding provided by The Science, Technology & References
Innovation Funding Authority (STDF) in cooperation with The Egyp-
tian Knowledge Bank (EKB). Abbas MA, Al-Mudhafar WJ, Wood DA (2023) Improving permeabil-
ity prediction in carbonate reservoirs through gradient boosting
Data availability The data is confidential. hyperparameter tuning. Earth Sci Inf 16:1–16
Akande KO, Owolabi TO, Olatunji SO, AbdulRaheem A (2017) A
Declarations hybrid particle swarm optimization and support vector regres-
sion model for modelling permeability prediction of hydrocarbon
Conflict of interests The authors confirm that they have no competing reservoir. J Petrol Sci Eng 150:43–53
financial interests or personal relationships that could have appeared to Al Khalifah H, Glover P, Lorinczi P (2020) Permeability prediction and
influence the work reported in this paper. diagenesis in tight carbonates using machine learning techniques.
Mar Pet Geol 112:104096
Open Access This article is licensed under a Creative Commons Attri- Alizadeh N, Rahmati N, Najafi A, Leung E, Adabnezhad P (2022)
bution 4.0 International License, which permits use, sharing, adapta- A novel approach by integrating the core derived FZI and well
tion, distribution and reproduction in any medium or format, as long logging data into artificial neural network model for improved
as you give appropriate credit to the original author(s) and the source, permeability prediction in a heterogeneous gas reservoir. J Petrol
provide a link to the Creative Commons licence, and indicate if changes Sci Eng 214:110573
were made. The images or other third party material in this article are Al-Mudhafar WJ (2016) Incorporation of bootstrapping and cross-
included in the article's Creative Commons licence, unless indicated validation for efficient multivariate facies and petrophysi-
otherwise in a credit line to the material. If material is not included in cal modeling. In: SPE rocky mountain petroleum technology
conference/low-permeability reservoirs symposium. SPE, pp. Lv A, Cheng L, Aghighi MA, Masoumi H, Roshan H (2021) A novel
SPE-180277-MS workflow based on physics-informed machine learning to deter-
Amaefule JO, Altunbay M, Tiab D, Kersey DG, Keelan DK (1993) mine the permeability profile of fractured coal seams using down-
Enhanced reservoir description: using core and log data to identify hole geophysical logs. Mar Pet Geol 131:105171
hydraulic (flow) units and predict permeability in uncored intervals/ Mahdy A, Zakaria W, Helmi A, Helaly AS, Mahmoud AM (2024)
wells. In: SPE annual technical conference and exhibition. OnePetro Machine learning approach for core permeability prediction from
Aminian K, Ameri S, Oyerokun A, Thomas B (2003) Prediction of flow well logs in Sandstone Reservoir, Mediterranean Sea, Egypt. J
units and permeability using artificial neural networks. In: SPE Appl Geophys 220:105249
Western Regional/AAPG Pacific Section Joint Meeting. OnePetro Male F, Jensen JL, Lake LW (2020) Comparison of permeability pre-
Balan B, Mohaghegh S, Ameri S (1995) State-of-the-art in permeabil- dictions on cemented sandstones with physics-based and machine
ity determination from well log data: Part 1-A comparative study, learning approaches. J Nat Gas Sci Eng 77:103244
model development. In: SPE Eastern Regional Meeting. OnePetro Matinkia M, Hashami R, Mehrad M, Hajsaeedi MR, Velayati A (2023)
Bear J (1988) Dynamics of fluids in porous media. Courier Corporation Prediction of permeability from well logs using a new hybrid
Brownlee J (2016) Machine learning mastery with Python: understand machine learning algorithm. Petroleum 9(1):108–123
your data, create accurate models, and work projects end-to-end. Moustafa A (1976) Block faulting in the Gulf of Suez. In: Proceedings
Machine Learning Mastery, San Francisco of the 5th Egyptian general petroleum corporation exploration
Carman PC (1937) Fluid flow through a granular bed. Trans Inst Chem seminar, Cairo, Egypt
Eng London 15:150–156 Moustafa A (1977) The Nullipore of Ras Gharib field. Deminex, Egypt
Chowdhary L, Taha S (1986) History of exploration and geology of Branch. Report EP. 24(77): 107.
Ras Budran, Ras Fanar and Zeit Bay oil fields, Egyptian General Okon AN, Adewole SE, Uguma EM (2021) Artificial neural network
Petroleum Corporation. In: 8th exploration conference, pp 42 model for reservoir petrophysical properties: porosity, permeabil-
Djebbas F, Ameur-Zaimeche O, Kechiched R, Heddam S, Wood DA, ity and water saturation prediction. Model Earth Syst Environ
Movahed Z (2023) Integrating hydraulic flow unit concept and 7(4):2373–2390
adaptive neuro-fuzzy inference system to accurately estimate Ouda K, Masoud M (1993) Sedimentation history and geological evo-
permeability in heterogeneous reservoirs: Case study Sif Fatima lution of the Gulf of Suez during the Late Oligocene-Miocene.
oilfield, southern Algeria. J Afr Earth Sc 206:105027 Geodyn Sediment Red Sea-Gulf of Aden Rift Syst 1:47–88
El Naggar A (1988) Geology of Ras Gharib, Shoab Gharib and Ras Raschka S, Mirjalili V (2019) Python machine learning: machine
Fanar oil fields. SUCO International Report 88:522 learning and deep learning with Python, scikit-learn, and Ten-
Elkatatny S, Mahmoud M, Tariq Z, Abdulraheem A (2018) New sorFlow 2. Packt Publishing Ltd
insights into the prediction of heterogeneous carbonate reservoir Rateb R (1988) Miocene planktonic foraminiferal analysis and its
permeability from well logs using artificial intelligence network. stratigraphic application in the Gulf of Suez region. In: 9th
Neural Comput Appl 30:2673–2683 EGPC exploration production conference, pp 1–21
Géron A (2022) Hands-on machine learning with Scikit-Learn, Keras, Souaya FJ (1965) Miocene foraminifera of the Gulf of Suez region,
and TensorFlow. O’Reilly Media Inc UAR; part 1, systematics (Astrorhizoidea-Buliminoidea).
Hearn C, Ebanks W, Tye R, Ranganathan V (1984) Geological fac- Micropaleontology 11(3):301–334
tors influencing reservoir performance of the Hartzog Draw Field, Thiébaud CE, Robson DA (1979) The geology of the area between
Wyoming. J Petrol Technol 36(08):1335–1344 Wadi Wardan and Wadi Gharandal, east Clysmic rift, Sinai,
Hosny W, Gaafar I, Sabour A (1986) Miocene stratigraphic nomen- Egypt. J Petrol Geol 1(4):63–75
clature in the Gulf of Suez region. In: Proceedings of the 8th Tiab D, Donaldson EC (2015) Petrophysics: theory and practice of
Exploration Conference: Cairo, Egyptian General Petroleum Cor- measuring reservoir rock and fluid transport properties. Gulf
poration, pp 131–148 professional publishing
Kamali MZ, Davoodi S, Ghorbani H, Wood DA, Mohamadian N, Laj- Wendt W, Sakurai ST, Nelson P (1986) Permeability prediction from
morak S, Rukavishnikov VS, Taherizade F, Band SS (2022) Per- well logs using multiple regression, reservoir characterization.
meability prediction of heterogeneous carbonate gas condensate Elsevier, pp 181–221
reservoirs applying group method of data handling. Mar Pet Geol Xue G, Datta-Gupta A, Valko P, Blasingame T (1997) Optimal trans-
139:105597 formations for multiple regression: application to permeability
Khalid M, Desouky SE-D, Rashed M, Shazly T, Sediek K (2020) estimation from well logs. SPE Form Eval 12(02):85–93
Application of hydraulic flow units’ approach for improving Zhu L-Q, Zhang C, Wei Y, Zhang C-M (2017) Permeability predic-
reservoir characterization and predicting permeability. J Petrol tion of the tight sandstone reservoirs using hybrid intelligent
Exploration Prod Technol 10(2):467–479 algorithm and nuclear magnetic resonance logging data. Arab
Khalil S, McClay K (2001) Tectonic evolution of the NW Red Sea-Gulf J Sci Eng 42(4):1643–1654
of Suez rift system. Geol Soc Lond Special Publ 187(1):453–473
Kharrat R, Mahdavi R, Bagherpour MH, Hejri S (2009) Rock type- Publisher's Note Springer Nature remains neutral with regard to
and permeability prediction of a heterogeneous carbonate res- jurisdictional claims in published maps and institutional affiliations.
ervoir using artificial neural networks based on flow zone index
approach. In: SPE middle east oil and gas show and conference.
OnePetro
Kozeny J (1927) Úber kapillare leitung der wasser in boden. Royal
Academy of Science
LaValle SM, Branicky MS, Lindemann SR (2004) On the relationship
between classical grid search and probabilistic roadmaps. Int J
Robot Res 23(7–8):673–692

Improving Permeability Prediction Via Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improving Permeability Prediction Via Machine Learning

Uploaded by

Copyright:

Available Formats

Environmental Earth Sciences (2024) 83:244

Improving permeability prediction via Machine Learning

Received: 9 October 2023 / Accepted: 3 March 2024

Fig. 1 Location map of Ras Fanar field

Select wells that have cored and logged

Select core porosity and permeability data

Group the reservoir into HFUs via the FZI approach

Group data into input features (influential

Random data split into training and testing

Train the ML algorithms with cross validation

High R2adj, low MAE, and MSE

Test data validation

Train the model on data of wells containing the

Check the trained model accuracy by predicting permeability

The following steps were followed to achieve the aim of 1 Φ3

core data, as follows:

A Random Forest (RF) is an ensemble of decision trees.

Fig. 4 A plot of core porosity vs. logarithm of core permeability of

The accuracy and prediction performance of ML models

from the other. Hence, grouping the reservoir into different

Fig. 8 Heat map of well logs

Fig. 12 Validation metrics of

Fig. 13 Comparison of the loga-

Fig. 14 Comparison of the loga-

Fig. 16 Histogram of the K-log

Table 6 Parameters of the permeability model

Gaussian random function simulation Permeability Spherical 725 725 82.2

Fig. 17 3D distribution of permeability of Nullipore reservoir

You might also like