You are on page 1of 12

Fuel 278 (2020) 118358

Contents lists available at ScienceDirect

Fuel
journal homepage: www.elsevier.com/locate/fuel

Full Length Article

Prediction of methane adsorption in shale: Classical models and machine T


learning based models
Meng Menga, , Ruizhi Zhongb, Zhili Weic

a
Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
b
School of Chemical Engineering, The University of Queensland, Brisbane, Australia
c
Department of Earth and Atmospheric Sciences, University of Houston, Houston, TX, USA

ARTICLE INFO ABSTRACT

Keywords: Shale gas contributes significantly to current global energy consumption, and an accurate estimation of geolo-
Shale gas gical gas-in-place (GIP) determines an optimal production plan. As the dominant form of storage, adsorbed gas in
Adsorption model shale formation is of primary importance to be assessed. This paper summarizes adsorption models into tradi-
Classical model tional pressure/density dependent isothermal models, pressure and temperature unified model, and machine
Machine learning
learning based models. Using a comprehensive experimental dataset, these models are applied to simulate shale
XGBoost
gas adsorption under in-situ conditions. Results show that the modified Dubinin-Radushkevich (DR) model
provides the optimal performance in traditional isothermal models. Pressure and temperature unified models
make a breakthrough in isothermal conditions and can extrapolate the predictions beyond test ranges of tem-
perature. Well-trained machine learning models not only break the limit of the isothermal condition and types of
shale formation, but can also provide reasonable extrapolations beyond test ranges of temperature, total organic
carbon (TOC), and moisture. Four popular machine learning algorithms are used, which include artificial neural
network (ANN), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost).
The XGBoost model is found to provide the best results for predicting shale gas adsorption, and it can be con-
veniently updated for broader applications with more available data. Overall, this paper demonstrates the
capability of machine learning for prediction of shale gas adsorption, and the well-trained model can potentially
be built into a large numerical frame to optimize production curves of shale gas.

1. Introduction for 20 ~ 85% of total shale gas-in-place [3,4]. Therefore, the prediction
of adsorbed gas is important for gas production.
Shale gas has become an important contributor to the current global Unlike coal bed methane, which is usually buried under 1000 m
energy structure. In the U.S., the shale gas revolution has shifted the depth, most shale formations are located in a much deeper depth, from
U.S. from an oil importer to an oil exporter. Natural gas production 1000 m to 3000 m. The reservoir pressure can go up to 30 MPa and the
grew by 10.0 billion cubic feet per day (Bcf/d) in 2018 and the total reservoir temperature can be as high as 90 °C [5,7]. Extensive adsorp-
production amount of shale gas occupies more than 60% of U.S. dry tion experiments of natural gas in shale have been conducted under
natural gas production [1]. In China, massive investments have been different pressures and temperatures [6,8–12]. Based on these experi-
put into the exploration of shale gas. Shale gas production has increased mental data, an accurate prediction model for methane adsorption in
by 40% between 2017 and 2018, reaching 20.0 million cubic meters per shale is significant for estimating the GIP and proposing an optimal
day [2]. Despite the widespread importance, substantial uncertainties production plan. Most researchers [13–19] used classical isothermal
exist in assessing the quantity of recoverable shale gas [3]. The primary functions to simulate shale gas adsorption. These classical models can
uncertainty point is the assessment of natural gas adsorption under in mainly be divided into monolayer adsorption models (Langmuir style
situ conditions, which influences the determination of the geological model), multilayer adsorption models (Toth model), and pore-filling
gas-in-place (GIP) quantity and production life of shale gas wells. models (DR model). Despite the wide application of traditional iso-
Natural gas is stored in the porous space of shale in three phases: free thermal models, no one can deny their deficiencies when it comes to
gas, adsorbed gas, and dissolved gas [3–9]. The adsorbed gas accounts isothermal and homogenous assumptions. To overcome these


Corresponding author at: Los Alamos National Laboratory, EES-14 Geomechanics, USA.
E-mail address: mengm@lanl.gov (M. Meng).

https://doi.org/10.1016/j.fuel.2020.118358
Received 7 March 2020; Received in revised form 3 June 2020; Accepted 8 June 2020
Available online 20 June 2020
0016-2361/ © 2020 Elsevier Ltd. All rights reserved.
M. Meng, et al. Fuel 278 (2020) 118358

Nonmenclaure Va The volume of the condensed phase per unit mass shale
g The density of gas at a certain temperature and pressure
RMSE Root mean squared error a The density of gas at adsorbed phase
SSE Sum of errors freegas The density of free gas
TOC Total organic content k The parameter related to the affinity of coal for gas
XGBoost Extreme gradient boosting t Equilibrium constant which reflects the heterogeneity of
ANN Artificial neural network the adsorbent
RF Random forest D A constant related to the affinity of coal for gas
SVM Support vector machine P Pressure, MPa
R2 Coefficient of determination K1,K2 Langmuir constant for dual-site Langmuir model
ne Excessive adsorbed amount, mmol/g A1, A2 The pre-exponential coefficient
nabsolute Absolute adsorbed amount, mmol/g E1, E2 Constant of energy of adsorption
nmax Maximum adsorbed amount, mmol/g R Ideal gas content
K0 Temperature dependent Langmuir constant

deficiencies, Tang [3] innovatively used the idea of dual adsorption applications of machine learning have been applied to shale gas ad-
sites and proposed a dual-site Langmuir equation, which is similar to sorption evaluation. Meng [28] attempted to use an ANN for predicting
the dual-porosity model in poroelasticity theory of deformation-diffu- supercritical CO2 adsorption in coal. Results show that it not only
sion response of rock [20–22]. After comparing Langmuir-style models, provides similarly accurate results as Langmuir-style model, but also
Toth models, and DR style models, they concluded that the dual-site breaks the limit of isothermal condition and types of coal. Li [29] used a
Langmuir model is superior to other models in interpreting observed K-Nearest Neighbor algorithm to estimate the adsorbed shale gas con-
phenomena and extrapolating adsorption isotherms beyond the test tent by geological parameters. However, Li’s method ignored the la-
data [5]. In reality, shale has an abundance of different adsorption sites, boratory experimental data, which cannot be a complete substitution
and a two-site model still simplified the surface heterogeneous effect. Li for adsorption experiments and can only be used for the rough esti-
[7] proposed a multi-site model and proved that their model is not only mation of gas content in the whole reservoir. Therefore, machine
effective in predicting adsorption isotherms but also in the thermo- learning is a promising tool to predict shale gas adsorption but requires
dynamic parameters of energetically heterogeneous shales. more work.
The dual-site and multi-site models successfully break the limit of In this paper, we used a large amount of laboratory adsorption data
isothermal conditions. However, these models are mathematically more to estimate geological GIP and production life of shale gas wells. A total
complex and require global optimization of several parameters. In ad- of 630 adsorption experimental data were compiled and analyzed, with
dition, they still cannot break the limit of shale types, and the relevant 278 data for classical mechanistic models and 352 data for machine
underlying assumptions should not be taken for granted [5]. These learning models. Adsorption data may have different units, and they are
limits restrict the generality of these models in a large numerical frame unified to the same units for optimization [28]. The comparison be-
of predicting and optimizing the production curves of shale gas. After tween classical models and machine learning based models were com-
years of development, machine learning has become a powerful tool to prehensively provided. Based on the results of optimized machine
build predictive models [23–29] which can uncover hidden patterns learning based models, this paper can predict shale adsorption behavior
and unknown correlations between variables. Currently, only a few and be incorporated into the model of shale gas production plans.

Table 1
Classical adsorption isotherm models.
Type No. Models

Langmuir style models Model 1 [18,19] g nmax K 0 P (1)


ne = 1
a 1 + K0 P
Model 2 [14] g nmax K 0 P (2)
ne = 1 + kP
a 1 + K0 P
Model 3 [5] g nmax K 0 P g (3)
ne = 1 + kP 1
a 1 + K0 P a
Model 4 [5] ne =
nmax K 0 P
Va (4)
1 + K0 P g

Toth models Model 5 [5] ne =


nmax K 0 P
Va (5)
g
[1 + (K 0 P )t ]1 t
DR style models Model 6 [13] g
2 (6)
D ln
g a
ne = nmax 1 e
a
Model 7 [14] g
2 (7)
D ln
g a
ne = nmax 1 e + kP
a
Model 8 [14] g
2 (8)
D ln
g a g
ne = nmax 1 e + kP 1
a a

Notes: In the above equations, g is the density of gas at a certain temperature and pressure; a is the density of gas at adsorbed phase, which is usally
assumed as a constant value; nmax is the maximum adsorbed amount; K 0 is the Langmuir constant; P is the pressure; k is the parameter related to the
affinity of shale for gas; Va is the volume of the condensed phase per unit mass shale; t is equilibrium constant which reflects the heterogeneity of the
adsorbent; D is a constant related to the affinity of shale for gas.

2
M. Meng, et al. Fuel 278 (2020) 118358

2. Methodology Table 3
Best three isothermal models in all 20 cases.
2.1. Classical pressure/density dependent isothermal models Case ID Temperature Best three isothermal models

The most commonly used models for adsorption are pressure/den- 1 313.75 K [8] Model 8, 7, 5
2 333.75 K [8] Model 7, 2, 8
sity dependent isothermal models, which use the pressure or density as
3 348.75 K [8] Model 7, 2, 8
the variable in the isothermal condition. Similar to the work by Tang 4 368.75 K [8] Model 7, 8, 2
[5], the most typical models are summarized in Table 1. All of them are 5 303.15 K [6] Model 7, 2, 8
optimized using the published adsorption data [5–8,30,31]. 6 323.15 K [6] Model 3, 7, 2
7 343.15 K [6] Model 2, 3, 8
8 303.15 K [7] Model 8, 7, 5
2.2. Pressure and temperature unified model 9 333.15 K [7] Model 5, 8, 7
10 363.75 K [7] Model 8, 5, 7
Different from the pressure/density dependent isothermal models, 11 333.15 K [31] Model 3, 5, 4
the pressure and temperature unified model breaks the limit of iso- 12 373.15 K [31] Model 5, 3, 8
13 413.15 K [31] Model 5, 3, 8
thermal conditions. Tang [3] proposed a so-called “dual-site Langmuir
14 318.15 K [5] Model 5, 8, 7
model” including the influence of temperature, which is shown in Eq. 15 333.15 K [5] Model 5, 8, 7
(9). 16 348.15 K [5] Model 5, 8, 7
17 355.15 K [5] Model 5, 8, 7
KP K2 P 18 313.75 K [30] Model 8, 7, 3
ne = (1 ) 1 +1K P + 1 + K2 P
(nmax Vmax g)
1 19 348.75 K [30] Model 5, 7, 8
K1 (T ) = A1 exp ( ), K (T ) = A exp ( )
E1
RT 2 2
E2
RT (9)
20 368.75 K [30] Model 5, 8, 7

The temperature is introduced in this model by modification of the


Langmuir constant K . Instead of considering K as a constant, Tang [3] 20
used the temperature dependent equation to calculate K , which is
16
( )
E
A exp RT . The parameter is a weighted coefficient, which represents
the fraction of the second type of adsorption site [3]. With this mod- Number of cases
12
ification, the temperature is included in the model and the new model is
called the pressure and temperature unified model. Theoretically, this 8
unified model should perform better with the verification of the ex-
4
perimental data. The same set of data will be used for optimization as
those applied to traditional isothermal models in Part 2.1. 0

2.3. Machine learning based models

In the field, there are various factors (variables) influencing the Types of model
adsorption amount of methane in shale. These factors not only include
external factors such as pressure, temperature, and moisture, but also Fig. 1. Number of best performance cases for different models.
internal factors such as total organic content (TOC), thermal maturity,
and clay minerals. However, neither traditional isothermal models nor completely in publications. Second, some variables, especially mineral
the pressure and temperature unified model can break the limitation of components (TOC, thermal maturity, clay minerals), are not completely
shale types (e.g., TOC, thermal maturity, clay components, moisture). independent of each other. For example, Zhao [32] found a linear re-
With enough data, machine learning based models can include all the lationship between TOC and clay minerals for shale. Last but not least,
related variables. In the machine learning model, the prediction of the contributions of these variables to methane adsorption on shale are
methane adsorption on shale is a multivariate regression problem. not equal. The external factors, such as pressure and temperature, are
Although many variables contribute to methane adsorption on well-known and important. For other factors, Tan [4] mentioned that
shale, it is not necessary to apply all these variables in the machine TOC is the primary controlling factor on methane sorption capacity,
learning optimization. First, it is technically feasible—but physically and clay minerals are secondary in importance on the methane sorption
kind of difficult—to obtain all the data. The measurement of moisture, capacity compared to the TOC content.
TOC, and thermal maturity requires different equipment, in which in- Beaton [10–12] worked extensively on adsorption measurements of
dividual researchers usually do not perform all these basic measure- methane on shale. They provided the value of TOC for all different
ments. That’s why these fundamental properties are not provided

Table 2
Optimized parameters and performance of isothermal models at 313.75 K using Chen’s data [8]
Types Parameters Performance

nmax K0 k Va t D SSE R2 AdjustedR2 RMSE

Model 1 5.3520 0.1786 0.1601 0.9805 0.9785 0.1265


Model 2 4.7910 0.2216 0.0058 0.1153 0.9859 0.9828 0.1132
Model 3 4.3630 0.2536 0.0215 0.1101 0.9866 0.9836 0.1106
Model 4 5.0650 0.2276 10.3200 0.1201 0.9853 0.9821 0.1155
Model 5 7.9290 0.2084 15.4000 0.6643 0.0965 0.9882 0.9838 0.1098
Model 6 4.7490 0.0965 0.1877 0.9771 0.9748 0.1370
Model 7 4.3150 0.0073 0.0855 0.0771 0.9906 0.9885 0.0925
Model 8 4.0460 0.0226 0.0813 0.0727 0.9911 0.9892 0.0899

3
M. Meng, et al. Fuel 278 (2020) 118358

3 0.1

Adsorption (mmol/g)
2.5

Adsorption (mmol/g)
0.08
2
0.06
1.5
313.75K Mod 313.75K Exp
0.04
1 333.75K Mod 333.75K Exp 303.15K Mod 303.15K Exp
0.5 348.75K Mod 348.75K Exp 0.02 323.15K Mod 323.15K Exp
368.75K Mod 368.75K Exp 343.15K Mod 343.15K Exp
0 0
0 20 40 60 0 10 20 30
Pressure (MPa) Pressure (MPa)
(a) Chen’s data [8] (b) Jiang’s data [6]

0.04 0.1 333.15K Mod


333.15K Exp
Adsorption (mmol/g)

Adsorption (mmol/g)
0.03 0.08 373.15K Mod
373.15K Exp
0.06 413.15K Mod
0.02 413.15K Exp
0.04
303.15K Mod 303.15K Exp
0.01 333.15K Mod 333.15K Exp 0.02
363.15K Mod 363.15K Exp
0 0
0 5 10 15 20 0 20 40 60 80
Pressure (MPa) Pressure (MPa)
(c) Li’s data [7] (d) Li’s data [31]

2 318.15K Mod 318.15K Exp 1.6


333.15K Mod 333.15K Exp
Adsorption (mmol/g)
Adsorption (mmol/g)

1.6 348.15K Mod 348.15K Exp


355.15 Mod 355.15K Exp 1.2
1.2
0.8
0.8
0.4 313.75 K Exp 313.75K Mod
0.4 348.75K Exp 348.75K Mod
368.75K Exp 368.75K Mod
0 0
0 5 10 15 0 20 40 60
Pressure (MPa) Pressure (MPa)
(e) Tang’s data [5] (f) Xiong’s data [30]
Fig. 2. Performance of Model 8 for different groups of data. (Mod is model results; Exp is experimental results).

Table 4
Parameters and performance of Model 9 for different groups of data.
Parameters Chen's data [8] Jiang's data [6] Li's data [7] Li's data [31] Tang's data [5] Xiong's data [30]

α 0.42 0.999 0.368 0.802 0.806 0.678


A1 0.002 0.0024 0.002 5.71E-10 1.64E-04 0.029
A2 2.82E-07 1.92E-06 0.013 1.16E-02 2.78E-04 4.58E-07
E1 14.24 12.966 13.321 54.335 24.370 9.97E-14
E2 32.110 9.665 14.674 9.054 15.303 25.582
nmax 5.373 60.247 0.051 0.099 2.818 12.307
Vmax 12.075 282.043 0.163 0.269 8.878 42.708
SSE 1.076E-05 9.097E-05 1.076E-05 1.65E-04 0.002 1.258
R-square 0.9924 0.9952 0.9924 0.9932 0.9998 0.803
Adjusted R-square 0.9917 0.9944 0.9917 0.9864 0.9997 0.803
RMSE 0.0005 0.0017 0.0005 0.0019 0.0060 0.1571

tested shale, and the value of TOC is used to represent the internal algorithms are used, including extreme gradient boosting (XGBoost),
factors. All major external factors of pressure, temperature, and artificial neural network (ANN), random forest (RF), and support vector
moisture were also provided by Beaton [10–12]. Therefore, a total of machine (SVM). The brief introductions of these algorithms are written
352 data points from Beaton [10–12] with four variables including as follows. The basic equations for these algorithms are suggested to be
pressure, temperature, moisture, and TOC are used as the input reached from the cited publications [28,33–36].
(Supplement 1). In this study, four types of popular machine learning

4
M. Meng, et al. Fuel 278 (2020) 118358

3.00 0.09
2.50

Adsorption (mmol/g)

Adsorption (mmol/g)
2.00 0.06
1.50
313.75 K Mod 313.75K Exp
333.15K Mod 333.15K Exp 303.15K Mod 303.15K Exp
1.00 0.03 323.15K Mod 323.15K Exp
348.75K Mod 345.15K Exp
368.75K Mod 368.75K Exp 343.15K Mod 343.15K Exp
0.50 388.75K Mod 408.75K Mod 363.15K Mod 383.15K Mod
428.75K Mod 403.15K Mod
0.00 0
0 20 40 60 0 10 20 30
Pressure (MPa) Pressure (MPa)
(a) Chen’s data [8] (b) Jiang’s data [6]

0.04 0.07
0.06
Adsorption (mmol/g)

Adsorption (mmol/g)
0.03 0.05
0.04
0.02 303.15K Mod 303.15K Exp 0.03
333.15K Mod 333.15K Exp 333.15K Mod 333.15K Exp
363.15K Mod 363.15K Exp 0.02 373.15K Mod 373.15K Exp
0.01
373.15K Mod 383.15K Mod 0.01 413.15K Mod 413.15K Exp
393.15K Mod 453.15K Mod
0 0
0 5 10 15 20 0 20 40 60
Pressure (MPa) Pressure (MPa)

(c) Li’s data [7] (d) Li’s data [31]

318.15K Mod 318.15K Exp


2 333.15K Mod 333.15K Exp
2.4 313.75 K Exp 313.75K Mod
345.15K Mod 345.15K Exp 348.75K Exp 348.75K Mod
355.15K Mod 355.15K Exp 2 368.75K Exp 368.75K Mod
Adsorption (mmol/g)
Adsorption (mmol/g)

1.5 365.15K Mod 375.15K Mod 388.75 K Mod


385.15K Mod 1.6

1 1.2
0.8
0.5
0.4

0 0
0 5 10 15 0 20 40 60
Pressure (MPa) Pressure (MPa)
(e) Tang’s data [5] (f) Xiong’s data [30]
Fig. 3. Performance of the unified model for different groups of data. In the legend, Mod means model results and Exp means experimental results. Grey dashed lines
are generated beyond the test range using the unified model.

2.3.1. Extreme gradient boosting (XGBoost) autonomous driving, etc. A typical ANN model consists of three parts:
Developed by Chen and Guestrin [33], XGBoost is an open-source, one input layer, one or several hidden layers, and one output layer [28].
scalable end-to-end tree boosting system that is widely used by data In each layer, there are multiple nodes that receive values from the
scientists to achieve state-of-the-art accuracy on many classification predecessor nodes, perform computations using activation functions,
and regression problems. Compared with other implementations of and then deliver the outputs to the successor nodes.
gradient boosting, XGBoost has proven to provide a faster and more
accurate prediction. Gradient boosting is an approach where new 2.3.3. Random forest (RF)
models are created that predict the residuals of the prior model and RF or random decision forests are an ensemble learning method for
then added together to make the final decision. It uses a gradient des- classification, regression, and other tasks. It operates by constructing a
cent algorithm to minimize the loss when adding new models. XGBoost multitude of decision trees at training time and outputting the class that
is a library for developing fast, high-performance gradient boosting tree is the mode of the classes (classification) or mean prediction (regres-
models [34]. sion) of the individual trees. Essentially, it aims to make the trees de-
correlated and prune the trees by setting a stopping criterion for node
2.3.2. Artificial neural network (ANN) splits [35]. RF has been successfully applied in many data science
ANN is inspired by the biological neural networks of animal brains. competitions and has the advantage of simplicity and interpretability.
It is a powerful machine learning algorithm for both regression and
classification problems. The deep neural networks have been applied to 2.3.4. Support vector Machine (SVM)
many fields including speech recognition, machine translation, SVM is a supervised learning model with associated learning

5
M. Meng, et al. Fuel 278 (2020) 118358

algorithms for classification and regression analysis. When SVM is ap-


plied in the regression, it is also called support vector regression (SVR).
SVR uses the same principles as the SVM for classification, with only a
few minor differences. In the case of regression, the output is a real
number and a margin of tolerance is set in approximation. The main
purpose of SVM is the same as SVR: to minimize error, individualizing
the hyperplane which maximizes the margin [36].

2.4. Evaluation metrics systems

There are many forms of evaluation metrics systems, such as root


mean squared error (RMSE), and coefficient of determination (R2 ). All
these metrics offer complementary information that evaluates the per-
formance of the optimization model. In this paper, four metrics systems
are used.
RMSE has the following expression
Fig. 5. Division of training set and testing set in the machine learning optimi-
1
n
zation.
RMSE = (yi yi )2
n i=1 (10)
best three isothermal models were provided in all cases (Table 3). For
The Sum of errors (SSE) is the sum of the differences between the different cases, the results of the best three models are different and
prediction of each observation and the population mean. they are presented in Fig. 1. According to Fig. 1, Model 8 provides the
n best performance in most cases compared with other models. Therefore,
SSE = (yi yi ) for the cases studied here, the modified DR models provide the best
(11)
performance within these eight isothermal models. The performance of
i=1

Coefficient of determination ( R2 or R-square) has the following ex- Model 8 for these cases is plotted in Fig. 2, and it shows that the pre-
pression dictions are very close to the calculations.

(yi yi )2
R2 = 1 3.2. Results of pressure and temperature unified model
(yi yi ) 2 (12)
For pressure and temperature unified model, the isothermal condi-
tion is not required. Therefore, the same data used for classical models
3. Results and analysis
[5–8,30,31] are used for optimization and we have six groups of data.
Using a global fitting method [5], the parameters and performance for
3.1. Results of traditional pressure/density dependent isothermal models
these six groups of data are presented in Table 4. According to the
evaluation metrics, the unified model shows excellent performances for
For isothermal models, the related parameters are different for
the first five groups of data with the R-squares as high as 0.99. This is an
various types of shale and various temperatures. To have a compre-
excellent performance for engineering applications. The only exception
hensive evaluation and comparison of the isothermal models, 20 groups
is the last group of data [30], which only has an R-square of 0.803. The
of isothermal data from several different studies [5–8,30,31] are used
relatively poor performance may be caused by the errors in the ex-
(Supplement 2). Table 2 is an example of optimized parameters and
perimental procedure or the special characteristics of the shale samples
performance of isothermal models at 313.73 K using Chen’s data [8]. In
performed in their tests.
this case, all types of isothermal models provide satisfactory perfor-
Fig. 3 shows the performance of the unified model for all six groups
mance. But it does not mean that these models will always provide such
of data. In the first five cases, the prediction curves are very close to the
good performance in all the 20 optimized cases. Detailed results of 20
cases are in Supplement 3. To have a better overview of the results, the
Table 5
Hyperparameters tuning for different machine learning algorithms.
Model Tuned hyperparameters Range Best
hyperparameters

Kernel linear, rbf, poly rbf


SVM Regularization term C 10-1, 1, 10, 102 10
Kernel coefficient 10-1, 1, 5, 10 1
gamma
RF Number of trees 5, 10, 30, 50, 100, 200 30
Max depth 5, 10, 15, 20, 30 20
Min sample split 2, 5, 10, 15, 20 2
Number of boosted trees 5, 10, 50, 100, 200 10
Max depth 3, 4, 5, 7, 9 4
XGBoost Learning rate 10-3, 10-2, 10-1, 1 10-2
Min child weight 1, 3, 5, 7, 9 3
Optimizer Adam, SGD, Adagrad Adagrad
ANN Kernel initializer Uniform, Glorot_uniform Uniform
ffGloGlorot_uniform,
Activation Relu, linear, softmax, tanh Relu
Learning rate 0.0001, 0.001, 0.01, 0.1 0.01
Gradient descent 0.000001, 0.00001, 0.00001
0.0001, 0.001, 0.01
Fig. 4. Machine learning optimization workflow.

6
M. Meng, et al. Fuel 278 (2020) 118358

(1) Artificial neural network (2) Random forest

(3) Support vector machine (4) XGBoost


Fig. 6. Learning curve of different machine learning algorithms using Beaton’s [10-12] data.

experimental data (points with different colors). Because both tem- learning process begins. These parameters are called hyperparameters
perature and pressure are included in the model, this model has been and can affect model performance [28,33–36], so it is necessary to
successfully proven to make a breakthrough in the isothermal condi- perform cross-validation and grid search of hyperparameters using
tions. Another advantage of this unified model is it can extrapolate the training data set; (4) use the optimized hyperparameters to train the
predictions beyond test ranges. These extrapolations are shown as grey model and generate a learning curve; (5) use the trained model to
dashed curves in Fig. 3. For instance, in Fig. 3(a), reasonable predic- evaluate the testing data; (6) compare the performance of different
tions can be generated under temperatures of 388.75 K, 408.75 K, and algorithms using proper evaluation metrics, and choose the optimal
428.75 K using the optimized parameters. This feature cannot be rea- algorithm; (7) check the robustness of the optimal model by re-dividing
lized by any isothermal models. Hence, as mentioned by Tang [5], the the training data and testing data, to see if we get the same optimal
unified model is not only superior to other available models in terms of algorithm.
interpreting the observed phenomena, but also can extrapolate ad- In the first step, we can plot all the data and delete the outliers. In
sorption isotherms beyond the test data. our work, the data are from high precision indoor adsorption mea-
surement, so no outliers are found. In the second step, we usually assign
80% of the data set as the training data and the remaining 20% data set
3.3. Machine learning results as the testing data, which is a common rule in machine learning. Note
that the data are randomly assigned and we should guarantee both the
As mentioned before, Beaton’s data [10–12] are used for machine training set and the testing set cover the whole range of data set [37].
learning optimization. All major external factors such as pressure, Fig. 5 shows the division of the training set and testing set from the
temperature, and moisture were also provided by Beaton [10–12]. whole data set. Both the testing set and training set lie in the whole
Therefore, in our machine learning optimization, four variables in- range of the data set, which avoids the data bias. The cross-validation
cluding pressure, temperature, moisture, and TOC are used as input. has been performed for each of these four algorithms and the best hy-
There is no ‘free lunch’ in machine learning or other data science op- perparameters are presented in Table 5. The range of hyperparameters
timization, and each algorithm will be suitable for a different data set. is based on common numbers used by previous researchers [28,33–36].
In this paper, four machine learning algorithms are used for optimiza- Using the best hyperparameters, the learning curve can be plotted to
tion: XGBoost, ANN, RF, and SVM. To evaluate the performance of show the performance of the training process. The performance mea-
these algorithms, a standard machine learning optimization workflow sure can be the loss function that is being optimized to train the model
(Fig. 4) is implemented: (1) find enough data points and preprocess the (such as logarithmic loss), or an external metric of interest to the pro-
data, including unit conversion, outliers detection, etc; (2) find enough blem (such as RMSE). For different algorithms, we may use different
data and divide the data set into training and testing data set. Usually, metrics in the learning curves, and performance may be measured in
the training data set is further divided into a training set and a vali- different styles. For example, ANN is measured by optimization steps
dation set; (3) take into account that some parameters cannot be (epoch), whereas random forest is measured by the number of samples.
learned by machine learning algorithms and should be set before the

7
M. Meng, et al. Fuel 278 (2020) 118358

(1) ANN training (2) ANN testing

(3) RF training (4) RF testing

(5) SVM training (6) SVM testing

(7) XGBoost training (8) XGBoost testing


Fig. 7. Training and testing performance of different machine learning algorithms using Beaton’s [10–12] data.

A bias-variance tradeoff is desired. As shown in Fig. 6, both ANN and data is available, the model performs better.
XGBoost provide good performance because the training score and Fig. 7 shows the optimization results of these four algorithms using
validation score are close to each other as the learning ends. For the Beaton’s data. In each plot, the x-axis is the adsorption from the ex-
SVM and RF, the validation curve is still decreasing. If more training periments, and the y-axis is the predicted adsorption. The black dotted

8
M. Meng, et al. Fuel 278 (2020) 118358

Table 6
Evaluation metrics of different machine learning algorithms.
Training set Testing set

XGBoost ANN RF SVM XGBoost ANN RF SVM

MAE (scf/ton) 0.0068 0.0710 0.0291 0.0419 0.0358 0.1143 0.1607 0.1457
RMSE (scf/ton) 0.0001 0.0140 0.0032 0.0093 0.0036 0.0294 0.0631 0.0548
R2 0.9995 0.9384 0.9861 0.9592 0.9886 0.8983 0.7815 0.8101

extrapolation capability. However, the experimental data used for


machine learning is from Beaton’s study [10–12], which is unavailable
for extrapolation. Furthermore, it is also not practical to use data from
other studies to verify our experimental data. One reason is not all re-
searchers provide complete experimental and sample conditions (e.g.,
pressure, temperature, moisture, TOC). Another reason is the shale
properties from other studies may be different from the shale properties
in Beaton’s study, which can significantly affect the methane absorp-
tion. To possibly use other data for validation, it is suggested to include
their shale data as an input. However, this requires further investiga-
tion. Therefore, the experimental data to verify the extrapolation is
unavailable in this study. Nevertheless, the trend is found to be rea-
sonable based on previous studies [4,10–12,28]. Note that if the ex-
perimental scale is large enough, it is possible to avoid the extrapola-
tion.

Fig. 8. Performance of XGBoost model.


4. Discussion and implications

line is the 100% agreement line, and the points are optimized results. Through the above analysis, the classical isothermal models have
The closer the points are to the black dashed line, the more accurate the strong limitations in predicting adsorption of shale gas. Although they
prediction of the machine learning algorithm. To quantitatively com- can provide relatively accurate results for a certain temperature, they
pare the performance of these machine learning algorithms, results of are highly dependent on experimental results and are restricted by both
evaluation metrics are presented in Table 6. For the training set, all four temperature and types of shale. The pressure and temperature unified
algorithms provide good predictions (R2 > 0.94) and XGBoost pro- model can break the limit of the isothermal condition, and it can ex-
vides the best performance (R2 1). For the testing set, XGBoost also trapolate the predictions beyond the test ranges of temperature.
provides the best performance with R2 of 0.99, which is much better However, it is still limited to one specific type of shale. There are nu-
than the other three algorithms. The performance of the trained merous types of shales in the world, and it will be perfect to find a
XGBoost model is shown in Fig. 8 and the predictions are highly similar model that can be used to predict the adsorption behavior of any given
to experiments. To further check the robustness of the algorithm, we re- type of shale. But this kind of unique model is kind of unrealistic or
divide the training data and testing data for another four times. Then unwise because the properties of different blocks of shale can be sig-
we provide the average values of performance of different algorithms, nificantly different, and there are many properties that can influence
shown in Table 7. Still, we can see that XGBoost is the optimal algo- the adsorption results. Using machine learning methods, a prediction
rithm. model is still realistic for certain blocks of shale if a sufficient amount of
As mentioned by Tang [5], a good model should have the prediction data is available. In this study, using the comprehensive data from
capability beyond the test data. To further test the XGBoost model, the Beaton [10–12], the XGBoost provides the best performance, and it can
predictions are extrapolated beyond test ranges. Fig. 9 (a) and Fig. 9(b) successfully break the limit of temperature and shale type. Also, it
show the extrapolation of the temperature. The results are reasonable provides accurate extrapolations beyond test ranges of both tempera-
because, with higher temperatures, the adsorption content is lower ture and shale type. In other words, using the trained machine learning
[28]. Fig. 9(c) and (d) show the extrapolation of TOC, and our pre- model, theoretically, researchers can predict the adsorption behavior
dictions are consistent with early researchers [4]. The higher TOC, the for a certain type of shale. Furthermore, the machine learning model
larger adsorption amounts of shale gas. Fig. 9(e) and (f) show the ex- can be conveniently updated. If a new dataset is provided, the current
trapolation of moisture and, as expected, the adsorption ability de- model can be upgraded with the new data set and provide broader
creases as increasing of moisture [4,10–12], which is consistent with applications. Therefore, compared with previous models (classical iso-
our knowledge. From all these cases, it is shown that the XGBoost thermal models or pressure/temperature unified models), machine
model can extrapolate the predictions beyond test ranges with rea- learning models directly build data-driven models and avoid numerous
sonable results. It is better to have experimental data to verify the theoretical assumptions and mathematical derivations.

Table 7
Average evaluation metrics of different machine learning algorithms with five times optimization.
Training set Testing set

XGBoost ANN RF SVM XGBoost ANN RF SVM

MAE (scf/ton) 0.0051 0.2289 0.0135 0.0855 0.0404 0.3938 0.1662 0.2119
RMSE (scf/ton) 0.0001 0.0999 0.0013 0.0122 0.0053 0.3005 0.0601 0.1319
R2 0.9997 0.9648 0.9966 0.9755 0.9781 0.9186 0.9088 0.8411

9
M. Meng, et al. Fuel 278 (2020) 118358

(a) Extrapolation of temperature Case I (b) Extrapolation of temperature Case II

(c) Extrapolation of TOC Case I (d) Extrapolation of TOC Case II

(e) Extrapolation of Moisture Case I (f) Extrapolation of Moisture case II


Fig. 9. Extrapolation cases using trained XGBoost model.

Tang [5] mentioned three criteria for comparing adsorption models and labor-intensive field coring, storage and transportation, and in-door
of methane in shale: the goodness-of-fit of the adsorption model for adsorption testing. Second, the properties of individual shale in the
describing experimental raw data, the interpretation of the observed target formation can be significantly different. The in-door tests may
test phenomena, and the prediction capability of the adsorption models not represent the performance of the whole formation due to hetero-
beyond the test data. The machine learning model proposed in this geneity. Using classical models, the studies are usually limited to aca-
paper fits all of these three criteria. Our machine learning model could demic interest. Machine learning models give us the opportunity to
provide accurate predictions in training and testing data, and it pro- significantly reduce the time spent on adsorption tests. Using all
vides reasonable extrapolation. From the perspective of industrial ap- available historical experimental data, researchers only need to eval-
plications, these three criteria are not enough. First of all, for a new uate some internal properties of the target shale formation, such as the
shale formation, if engineers need to estimate the adsorption behavior TOC, and then they can estimate the adsorption of shale gas [7,42,43].
of the target formation using the classical models or pressure/tem- Chen [29] used geological parameters to estimate adsorbed gas content
perature unified model, they still need to perform a time-consuming on a large scale, and he mentioned that adsorption experiments are still

10
M. Meng, et al. Fuel 278 (2020) 118358

necessary for the exact adsorbed gas content of a sample. With the inputs, researchers only need to evaluate some internal properties
machine learning model, researchers can avoid too many tedious ad- of the target shale formation, such as the TOC, to simulate the
sorption tests and still achieve engineering-level accuracy for adsorp- adsorption behavior on shale. Furthermore, this machine learning
tion contents. In the future, this machine learning model can be built model can be built into a large numerical frame to optimize the
into a large numerical frame to optimize the production curves of shale production curves of shale gas.
gas [38–41].
Note that the data used here is the excess adsorption amount, CRediT authorship contribution statement
therefore, the predicted adsorption amount is also the excess adsorption
amount. The absolute adsorption amount can be calculated based on Meng Meng: Conceptualization, Methodology, Software,
the predicted excess adsorption amount. At the early stage, the excess Investigation, Writing - original draft, Writing - review & editing.
adsorption curve and absolute adsorption curve overlap with each Ruizhi Zhong: Software, Validation, Writing - review & editing. Zhili
other, and the excess adsorption curve deviates from the absolute ad- Wei: Software.
sorption curve at a certain pressure with the increase of pore pressure.
The following equation can be one way to calculate the absolute ad- Declaration of Competing Interest
sorption curve.
a The authors declare that they have no known competing financial
nabsolute = ne
a freegas (13) interests or personal relationships that could have appeared to influ-
ence the work reported in this paper.
where a is the density of gas at adsorbed phase, a = 375kg cm3 for CH4
[44]. Acknowledgment
The above equation does not show the influence of inner chemical
property on the relationship between absolute adsorption and excess The author(s) received no financial support for the research, au-
adsorption. Some researchers [45] claim that the turning point could thorship, and/or publication of this article.
depend on the chemical property of given shale samples and differs for
each different shale samples. If this factor is considered, we may need Appendix A. Supplementary data
the chemical property of the shale samples to get the adsorption
amount. But the machine learning method definitely saves considerable Supplementary data to this article can be found online at https://
time and energy for excess adsorption tests, which benefits researchers. doi.org/10.1016/j.fuel.2020.118358.

5. Conclusion References

The adsorption behavior of shale gas is a significant factor in the [1] Today in Energy, 2019. U.S. natural gas production hit a new record high in 2018.
accurate assessment of natural gas adsorption under in situ conditions, https://www.eia.gov/todayinenergy/detail.php?id=38692.
which further influence the final determination of the geological GIP [2] Trent Jacobs China flexes shale muscles as CNPC ups gas output 40% Journal of
Petroleum Technology. 2019 https://www.spe.org/en/jpt/jpt-article-detail/
quantity and production life of shale gas wells. This paper compares the ?art=4955.
performance of methane adsorption behavior on shale among tradi- [3] Tang X, Ripepi N, Stadie NP, et al. A dual-site Langmuir equation for accurate es-
tional isothermal models, the temperature and pressure unified model, timation of high pressure deep shale gas resources. Fuel 2016;185:10–7.
[4] Tan J, Weniger P, Krooss B, et al. Shale gas potential of the major marine shale
and machine learning models. From the above analysis, the following formations in the Upper Yangtze Platform, South China, Part II: Methane sorption
conclusions can be drawn: capacity. Fuel 2014;129:204–18.
[5] Tang X, Ripepi N, Luxbacher K, et al. Adsorption models for methane in shales:
Review, comparison, and application. Energ Fuel 2017;31:10787–801.
(1) Eight traditional isothermal models are compared and analyzed [6] Jiang Z, Zhao L, Zhang D. Study of adsorption behavior in shale reservoirs under
with data from 20 cases. The modified DR models provide the best high pressure. J Nat Gas Sci Eng 2018;49:275–85.
performance compared with Langmuir style models and Toth [7] Li J, Chen Z, Wu K, et al. A multi-site model to determine supercritical methane
adsorption in energetically heterogeneous shales. Chem Eng J 2018;349:438–55.
models. However, these models are limited to isothermal conditions
[8] Chen L, Zuo L, Jiang Z, et al. Mechanisms of shale gas adsorption: Evidence from
and do not have the capability of extrapolating the predictions thermodynamics and kinetics study of methane adsorption on shale. Chem Eng J
beyond the test condition. These models are also limited to a spe- 2019;361:559–70.
[9] Weniger P, Kalkreuth W, Busch A, et al. High-pressure methane and carbon dioxide
cific type of shale.
sorption on coal and shale samples from the Paraná Basin. Brazil Int J Coal Geol
(2) The temperature and pressure unified model has been proven to 2010;84:190–205.
make a breakthrough in the isothermal conditions and it can ex- [10] Beaton AP, Pawlowicz JG, Anderson SDA, et al. Rock Eval™, total organic carbon,
trapolate the predictions beyond the test ranges. However, it is still isotherms and organic petrography of the Colorado Group: shale gas data release;
Energy Resources Conservation Board, ERCB/AGS Open File Report 2008-11
limited to a specific type of shale. (2009): 88.
(3) Four machine learning algorithms (ANN, RF, SVM, XGBoost) are [11] Beaton AP, Pawlowicz JG, Anderson SDA, et al. Rock Eval™, total organic carbon
applied. Results show that XGBoost provides the optimal perfor- and adsorption isotherms of the Duvernay and Muskwa formations in Alberta: shale
gas data release; Energy Resources Conservation Board, ERCB/AGS Open File
mance for both the training dataset and the testing dataset. This Report 2010-04 (2010): 33.
machine learning model can break the limit of the isothermal [12] Beaton AP, Pawlowicz JG, Anderson SDA, et al. Rock Eval™, total organic carbon
condition and shale type. It can also extrapolate the performance and adsorption isotherms of the Montney Formation in Alberta: shale gas data re-
lease; Energy Resources Conservation Board, ERCB/AGS Open File Report 2010-05
beyond the testing range. Furthermore, the machine learning model (2010): 37.
can be conveniently updated. If a new dataset is provided, the [13] Dubinin MM, Radushkevich LV. The equation of the characteristic curve of the
current model can be upgraded with a larger data set and provide activated charcoal. Proc Union Soviet Soc Rep Acad Sci 1947;55:331–7.
[14] Sakurovs R, Day S, Weir S, Duffy G. Application of a modified Dubinin−
broader applications.
Radushkevich equation to adsorption of gases by coals under supercritical condi-
(4) Machine learning models give us an opportunity to significantly tions. Energy Fuels 2007;21(2):992–7.
reduce time-consuming and labor-intensive work. The machine [15] Hutson ND, Yang RT. Theoretical basis for the Dubinin-Radushkevitch (DR) ad-
sorption isotherm equation. Adsorption 1997;3:189–95.
learning model can be trained using all available historical ex-
[16] Zhou S, Xue H, Ning Y, et al. Experimental study of supercritical methane ad-
perimental data and updated whenever a new dataset is available. sorption in Longmaxi shale: insights into the density of adsorbed methane. Fuel
Using a well-trained machine learning model with a variety of 2018;211:140–8.
[17] Song X, Lü X, Shen Y, et al. A modified supercritical Dubinin-Radushkevich model

11
M. Meng, et al. Fuel 278 (2020) 118358

for the accurate estimation of high pressure methane adsorption on shales. Int J isolated kerogen from the Sichuan Basin under pressure up to 60 MPa: Experimental
Coal Geol 2018;193:1–15. results and geological implications. Int J Coal Geol 2018;189:83–93.
[18] Butt HJ, Graf K, Kappl M. Physics and chemistry of interfaces. John Wiley & Sons; [32] Zhao P, Mao Z, Huang Z, Zhang C. A new method for estimating total organic
2013. carbon content from well logs. AAPG Bull 2016;100(8):1311–27.
[19] Masel RI. Principles of adsorption and reaction on solid surfaces. John Wiley & [33] Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the
Sons; 1996. 22nd acm sigkdd international conference on knowledge discovery and data
[20] Abousleiman Y, Nguyen V. Poromechanics response of inclined wellbore geometry mining. ACM; 2016.
in fractured porous media. J Eng Mech 2005;131(11):1170–83. [34] Brownlee J, XGBoost with Python. Machine Learning with Mastery; 2016.
[21] Zhang J, Bai M, Roegiers JC. Dual-porosity poroelastic analyses of wellbore stabi- [35] Breiman L. Random forests. Machine learning 2001;45(1):5–32.
lity. Inter J Rock Mech Min Sci 2003;40(4):473–83. [36] Steinwart I, Christmann A. Support vector machines. Springer Science & Business
[22] Meng M, Baldino S, Miska S. etc. Wellbore stability in naturally fractured forma- Media; 2008.
tions featuring dual-porosity/single-permeability and finite radial fluid discharge. J [37] Géron A. Hands-on machine learning with scikit-learn, keras, and tensorflow:
Petrol Sci Eng 2019;174:790–803. concepts, tools, and techniques to build intelligent systems. O'Reilly Media; 2019.
[23] Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via [38] Chen B, Harp DR, Lin Y, et al. Geologic CO2 sequestration monitoring design: a
the alternating direction method of multipliers. Found Trends® Mach Learn machine learning and uncertainty quantification based approach. Appl Energ
2011;3(1):1–122. 2018;225:332–45.
[24] Huang G, Huang GB, Song S, You K. Trends in extreme learning machines: a review. [39] Cui G, Wang Y, Rui Z, et al. Assessing the combined influence of fluid-rock inter-
Neural Netw 2015;61:32–48. actions on reservoir properties and injectivity during CO2 storage in saline aquifers.
[25] Huang GB, Zhou H, Ding X, et al. Extreme learning machine for regression and Energ 2018;155:281–96.
multiclass classification. IEEE Tran Syst, Man, Cybern, Part B (Cybern) [40] Zhang Y, Lebedev M, Jing Y, et al. In-situ X-ray micro-computed tomography
2012;42(2):513–29. imaging of the microstructural changes in water-bearing medium rank coal by su-
[26] Liaw A, Wiener M. Classification and regression by randomForest. R news percritical CO2 flooding. Inter J Coal Geolog 2019;203:28–35.
2002;2(3):18–22. [41] Lebedev M, Zhang Y, Sarmadivaleh M, et al. Carbon geosequestration in limestone:
[27] Rasmussen CE. Gaussian processes in machine learning. Summer School on pore-scale dissolution and geomechanical weakening. Int J Greenh Gas Con
Machine Learnin. Heidelberg: Springer Berlin; 2003. 2017;66:106–19.
[28] Meng M, Qiu Z, Zhong R, et al. Adsorption characteristics of supercritical CO2/CH4 [42] Wang S, Feng Q, Javadpour F, et al. Competitive adsorption of methane and ethane
on different types of coal and a machine learning approach. Chem Eng J in montmorillonite nanopores of shale at supercritical conditions: a grand canonical
2019;368C:847–64. Monte Carlo simulation study. Chem Eng J 2019;355:76–90.
[29] Chen Y, Jiang S, Zhang D, et al. An adsorbed gas estimation model for shale gas [43] Li J, Wu K, Chen Z, Wang W, Yang B, Wang K, et al. Effects of energetic hetero-
reservoirs via statistical learning. Appl Energ 2017;197:327–41. geneity on gas adsorption and gas storage in geologic shale systems. Appl Energ
[30] Xiong W, Zuo L, Luo L, et al. Methane adsorption on shale under high temperature 2019;251:113368.
and high pressure of reservoir condition: Experiments and supercritical adsorption [44] Zhang C, Zhang J. Coal Chemistry. Coal Industry Press; 2012.
modeling. Adsorpt Sci Technol 2016;34(2–3):193–211. [45] Tian Y, Yan C, Jin Z. Characterization of methane excess and absolute adsorption in
[31] Li J, Zhou S, Gaus G, et al. Characterization of methane adsorption on shale and various clay nanopores from molecular simulation. Sci Rep 2017;7(1):12040.

12

You might also like