You are on page 1of 4

2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA)

Accurate Prediction of Gas Compressibility Factor


using Kernel Ridge Regression
Maher Maalouf∗ , Naji Khoury† , Dirar Homouz‡ and Kyriaki Polychronopoulou §
∗ Khalifa University, P.O. Box 127788, Abu Dhabi, UAE
Email: maher.maalouf@ku.ac.ae
† Notre Dame University - Louaize, Beirut, Lebanon

Email: nnkhoury@gmail.com
‡ Khalifa University, P.O. Box 127788, Abu Dhabi, UAE

Email: dirar.homouz@ku.ac.ae
§ Khalifa University, P.O. Box 127788, Abu Dhabi, UAE

Email: kyriaki.polychrono@ku.ac.ae

Abstract—The natural gas compressibility factor (z) is one of considerations, the selected composition of the natural gas for
the critical parameters in the computations used for the upstream the accurate prediction of Z was sweet dry natural gas. These
and downstream zones of petroleum/chemical industries. The data were in the range of 35o to 250o F and 1, 000 to 10, 000
process of obtaining accurate value for physical and thermody-
namical properties of hydrocarbons is getting more challenging lb. per sq. inch [7].
in the case of multicomponent non ideal systems. The purpose of The process of obtaining accurate value for physical and
this work is applying the kernel ridge regression (KRR) in the thermodynamical properties of hydrocarbons is getting more
form of the recently developed truncated regularized kernel ridge challenging in the case of multicomponent non-ideal systems
regression (TR-KRR) algorithm to estimate z-factor. Compared [1], [2], [3]. There are two families of models in thermodynam-
to the support vector machines (SVM), the KRR algorithm is
just as accurate as, but faster than SVM. ics for phase equilibrium predictions and calculations, namely
Index Terms—kernel ridge regression, z-factor, gas compress- the equation-of-state model and liquid activity coefficient.
ibility factor, truncated-Newton method. Activity coefficient model can only be used for describing
mixtures of any complexity as long as the liquid is well below
I. I NTRODUCTION its critical temperature. On the other hand any mathematical
Natural gas compressibility factor (z) is one of the critical formulation of V , P , T and composition is called equation
parameters in the calculations that are used for the upstream of state (EoS). Although many equations of state have been
and downstream zones in petroleum/chemical industries and proposed, almost all are empirical. The most common sources
other fields such as natural gas as the latter is being used for of Z-factor computations are the experimental measurements,
electric power generation. In particular, natural gas is a mixture equations of state (EoS) and empirical correlations [1], [2],
of several components with methane as the major compound, [3]. In the literature there are above twenty correlations with
whereas other compounds can be found such as nitrogen, two variables for calculating the Z-factor. However these
carbon dioxide, ethane, propane, and heavier hydrocarbons correlations are complex and require specifying an initial value
[1], [2], [3]. From an engineering perspective, handling pro- and long computational time.
cesses that involve natural gas mandates the knowledge of the There are several popular regression (prediction) algorithms
compressibility factor, which basically describes the deviation such as artificial neural networks (ANNs) [8], support vec-
between real gas and ideal gas. Usually is difficult to define tor machines (SVMs) in addition to the least squares (LS)
the critical parameters for a natural gas mixture. For this, method. ANN is an approach that mimics biological neurons
pseudo-critical properties are widely used in the petrochemical to reproduce intelligent data processing techniques such as
industry. Critical properties values can be calculated based on pattern recognition, classification, regression, among others,
mixing rules. Given a known composition, Kay [4], SBV [5] by using processing units that are called artificial neurons.
and SSBV (modified by Sutton [6]) are mixing rules that have The SVM was first introduced by Vapnik [9] as a potential
been widely used. In general it can be stated that the mixing alternative to conventional ANN and is a kernel-based method
rule does not follow a simple average of the concentrations of for classification, regression and function approximation [10],
each component in the mixture. Also, Different components in [11]. The popularity of this method has increased and has been
the natural gas can be classified to their geometry and polarity. used in various research areas due to its superior stability and
In particular, small, spherical molecules (e.g. CH4 ) are well efficiency superior way compared to other techniques.
fitted by a two-constant law of corresponding states. On the Ordinary least squares (LS) algorithm is a simple and
other hand, non-spherical and weakly polar molecules require powerful technique that can be used for linear regression. It
the development of correlations using a third parameter, e.g., can be easily programmed and implemented with little effort.
the acentric factor, ω. Given the above mentioned criteria and Thus, one would benefit from extending LS to non-linear

978-1-7281-0130-9/19/$31.00 ©2019 IEEE


2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA)

regression problems. This can be done by utilizing the kernel where α is the unknown parameter in the KRR model and is
approach used in SVM and applying it to the LS method. estimated by the following objective function minimization:
This combination gives rise to kernel ridge regression (KRR),
which is an easy to use and powerful nonlinear regression 1 λ
f (α) = (y − Kα)T (y − Kα) + αT Kα. (6)
method. Recently, Maalouf and Homouz [12] significantly 2 2
improved the efficiency of KRR by using truncated Newton where λ ≥ 0 is the regularization (cost) parameter. The
method for parameter estimation. regularization term λ solves the large unstable coefficient
The objective of this work is applying the recently devel- problem discussed in the least-squares method.
oped method of kernel ridge regression (KRR) to estimate z- Hence, the solution can now be rewritten with respect to α
factor. The z-factor determination is both resource and time as
consuming and hence it is important to utilize a fast and α = (K + λIN )−1 y. (7)
accurate technique to estimate the z-factor.
Using direct and closed-form solution (7) to find the esti-
II. T RUNCATED -R EGULARIZED K ERNEL R IDGE mate of α could be very slow if the matrix (K + λIN ) is
R EGRESSION (TR-KRR) A LGORITHM dense [11] as it involves N equations and N unknowns, and
Our objective is to estimate a real-valued functional out- a complexity of O(N 3 ).
come in the form of vector y using input data X through a Alternatively, iterative approaches such as the truncated-
function y = f (X). The simplest form of this function is the Newton method can be used to the matrix inversion. The KRR
linear model and it can be expressed in matrix form as: model above can be re-written as
y = Xβ + , (1) (K + λIN )α = y, . (8)
where β is the variable vector and  is a random error. The The above model is a linear system of equations that consists
data X in RN ×d consist of N rows and d variables withy as of K as the kernel matrix and y as the prediction vector. This
an outcome vector, and each row is associated with an output system can be solved iteratively using all kinds of numerical
yi . methods, including the conjugate gradient (CG) method. The
The least squares (LS) method is widely used for estimating CG method has a computational complexity of order O(N 3 ) at
β [13], [14] by minimizing the sum of squared residuals worst if the solution converged in N steps. However, placing
(RSS), a limit on the number of iterations leads to the truncated
`
X Newton. Maalouf and Homouz [12] combined KRR with the
RSS = 2i = T  = (y − Xβ)T (y − Xβ). (2) truncated Newton method in a nearly linear CG algorithm
i=1 called Truncated Regularized Kernel Ridge Regression (TR-
KRR).
As long as the matrix (XT X) is non-singular, a solution exists
Interested readers should refer to Maalouf and Homouz [12]
and is given by
for a complete description of the TR-KRR algorithm.
β̂ = (XT X)−1 XT y. (3)
The LS method has some disadvantages such as poor esti- III. T HE Z-FACTOR DATA
mation of the regression coefficients with unstable and large The data used in this analysis is similar to the data in the
absolute values [15] and it is also limited to linear regression work of Kamyab et al. [16]. This data is based on digitizing
models. Kernel methods such as Kernel Ridge Regression the Standing Katz chart [7], [2] for natural hydrocarbon gases.
(KRR) can overcome these difficulties in the LS method. The data shows the dependence of the z-factor on two vari-
KRR introduces a non-linear function φ(.) that maps the ables; pseudo-reduced pressure (Ppr ) and the pseudo-reduced
data from one dimensional space into a higher dimensional temperature (Tpr ). The reduced pressure (or temperature) is
one, such that defined as pressure (or temperature) divided by the critical
value of pressure ( or temperature) for an individual gas. For
φ : x ∈ Rd → φ(x) ∈ F ⊆ RΛ . (4)
a mixture of gases such as natural gas, the critical value is
The φ transformation function usually maps nonlinear rela- replaced with pseudo-critical one and thus the pseudo-reduced
tionships between y and X into a linear relationship. Al- pressures and temperatures.
though these transformations φ(.) are generally unknown,
the regression solution to the problem depends mainly on IV. C OMPUTATIONAL R ESULTS & D ISCUSSION
the dot product in the feature space. The kernel function, In this study, we have used both the KRR and the support
K = K(xi , xj ) = hφ(xi ), φ(xj )i takes the place of the dot vector regression (SVR) methods to predict the z-factor with
product. ten-fold cross-validation [17]. The n-fold cross-validation di-
Therefore, the regression model using the KRR method is vides the data into n-folds, leaving n−1 folds for training, the
remaining fold is used for testing the model. This procedure
y = Kα + , (5) continues iteratively until all the testing folds are used. The
2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA)

Algorithm 1: Linear CG for computing α̂. A = K+λIN , advantage of KRR method lies in its high computational
b=y efficiency over SVR. When it comes to accuracy, both methods
Data: A, b, α̂(0) produce very close accuracy. In addition, both models have to
Result: α̂ such that Aα̂ = b be optimized for their parameters. Now, given the fact that
1 begin KRR is faster that SVR, then those parameters can be found
2 r(0) = b − Aα̂(0) /* Initialize the residual much faster using KRR.
*/
3 c=0 TABLE I: Optimal parameters using ten-fold cross validation
4 while ||r(c+1) ||2 > ε2 and c ≤ Max CG Iterations do
for both TR-KRR and SVR, where γ is the RBF kernel width
5 if c = 0 then
6 ζ (c) = 0
and C is the regularization parameter of SVR
7 else
T(c+1) (c+1) TR-KRR SVR
8 ζ (c) = r rT(c+1)r r(c) /* Update
λ σ C γ 
A-Conjugacy enforcer */
0.01 0.35 100 4.5 0.1
9 d(c+1) = r(c+1) + ζ (c) d(c) /* Update the
search direction */
T(c) (c)
10 s(c) = − drT(c) Ad
r
c /* Compute the optimal
step length */ TABLE II: Ten-fold CV accuracy and CPU time (in seconds)
11 α̂(c+1) = α̂(c) − s(c) d(c) /* Obtain an for both TR-KRR and SVR.
approximate solution */
12 r(c+1) = r(c) − Aα(c) /* Update the TR-KRR SVR
residual */ R2 MSE CPU Time R2 MSE CPU Time
13 c=c+1 0.997 0.00039 3.33 0.997 0.00039 5.56

V. C ONCLUSIONS
average of all the results is the overall accuracy. Furthermore,
the Gaussian Radial Basis Function (RBF) kernel: We have presented the use of truncated regularized ker-
2
nel ridge regression (KRR) algorithm as an alternative to
(− 2σ12 ||xi −xj || ) 2
K(xi , xj ) = e = e(−γ||xi −xj || )
(9) EoS and empirical correlation approaches of calculating gas
is chosen for both methods, with a kernel width σ. The compressibility factor (z-factor). The KRR predicts the z-
accuracy of the algorithms is computed using the ten-fold cross factor by building an nonlinear regression model in terms of
validation average of both the coefficient of determination (R2 ) the pressure and temperature. The performance of the KRR
and mean-squared error (MSE) values. The MSE measures algorithm is compared to that of the state-of-art data mining
the prediction error and defined as the average of the squared method support vector machines (SVM). With the conclusions
errors between the actual data and the predicted values of the that KRR is much computationally more efficient than the
model. support vector regression (SVR) method, while both methods
The data in our experiments is normalized for each variable provide an accurate way for calculating the z-factor, it can be
vector such that each variable vector has a mean of zero and a concluded that KRR is a time efficient method for large data
standard deviation of one. In addition, the normalization was sets.
only applied on the independent variables. The KRR method R EFERENCES
is implemented on a personal computer with 8GiB of RAM,
[1] M. B. Standing, Volumetric and phase behavior of oil field hydrocarbon
and for SVR the toolbox LIBSVM for MATLAB [18] is used systems: PVT for engineers. California Research Corp., 1951.
on the same machine. [2] D. L. V. Katz, Handbook of natural gas engineering. New York :
The final and optimal parameters with their corresponding McGraw-Hill, 1959.
accuracy and computational time are shown in Tables I and [3] J. W. Amyx, D. M. Bass, and R. L. Whiting, Petroleum reservoir
engineering: physical properties. McGraw-Hill College, 1960, vol. 1.
II, respectively. Table I provides the optimal parameter values [4] W. Kay, “Gases and vapors at high temperature and pressure-density of
for both regression models. As mentioned earlier, the param- hydrocarbon,” Industrial & Engineering Chemistry, vol. 28, no. 9, pp.
eters of both models could be either user defined or found 1014–1019, 1936.
[5] W. Stewart, S. Burkhardt, and D. Voo, “Prediction of pseudo-critical
through a grid search algorithm. It should be noted here that parameters for mixtures,” in AIChE Meeting, Kansas City, MO, May,
different parameter values, specially for SVR could affect the vol. 18, 1959.
computational speed and the accuracy of the algorithm. Table [6] R. Sutton et al., “Compressibility factors for high-molecular-weight
reservoir gases,” in SPE Annual Technical Conference and Exhibition.
II shows that the two methods produce very high accuracy Society of Petroleum Engineers, 1985.
for the corresponding parameter values shown. Regarding [7] M. Standing and D. Katz, “Density of natural gases,” Trans., AIME, vol.
computational time, the computational speed of the TR-KRR 146, pp. 140–149, 1942.
[8] C. M. Bishop, Neural networks for pattern recognition. Oxford
algorithm is almost double that of SVR, and this agrees with university press, 1995.
the findings of Maalouf and Homouz [12] that the biggest [9] V. Vapnik, The Nature of Statistical Learning. Springer, NY, 1995.
2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA)

[10] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector


Machines and other kernel-based learning methods. Cambridge
University Press, 2000.
[11] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis.
Cambridge University Press, 2004.
[12] M. Maalouf and D. Homouz, “Kernel ridge regression using truncated
newton method,” Knowledge-Based Systems, vol. 71, pp. 339–344, 2014.
[13] J. M. Lewis, S. Lakshmivarahan, and S. Dhall, Dynamic Data Assimi-
lation: A Least Squares Approach. Cambridge University Press, 2006.
[14] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to Linear
Regression Analysis, 5th ed. Wiley, 2012.
[15] L. C. C. Montenegro, E. A. Colosimo, G. M. Cordeiro, and F. R. B. Cruz,
“Bias correction in the cox regression model,” Journal of Statistical
Computation and Simulation, vol. 74, no. 5, pp. 379–386, 2004.
[16] M. Kamyab, J. H. Sampaio, F. Qanbari, and A. W. Eustes, “Using
artificial neural networks to estimate the z-factor for natural hydrocarbon
gases,” Journal of Petroleum Science and Engineering, vol. 73, no. 3,
pp. 248–257, 2010.
[17] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. Chap-
man & Hall/CRC, 1994.
[18] C.-C. Chang and C.-J. Lin, LIBSVM: a library for
support vector machines, 2001, software available at
http://www.csie.ntu.edu.tw/ cjlin/libsvm.

You might also like