You are on page 1of 14

Applied Soft Computing 26 (2015) 483–496

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Improving the prediction of petroleum reservoir characterization with


a stacked generalization ensemble model of support vector machines
Fatai Anifowose a,∗ , Jane Labadin a , Abdulazeez Abdulraheem b
a
Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
b
Department of Petroleum Engineering, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

a r t i c l e i n f o a b s t r a c t

Article history: The ensemble learning paradigm has proved to be relevant to solving most challenging industrial prob-
Received 21 March 2013 lems. Despite its successful application especially in the Bioinformatics, the petroleum industry has not
Received in revised form benefited enough from the promises of this machine learning technology. The petroleum industry, with
12 September 2014
its persistent quest for high-performance predictive models, is in great need of this new learning method-
Accepted 14 October 2014
ology. A marginal improvement in the prediction indices of petroleum reservoir properties could have
Available online 23 October 2014
huge positive impact on the success of exploration, drilling and the overall reservoir management portfo-
lio. Support vector machines (SVM) is one of the promising machine learning tools that have performed
Keywords:
Stacked generalization ensemble
excellently well in most prediction problems. However, its performance is a function of the prudent
Support vector machines choice of its tuning parameters most especially the regularization parameter, C. Reports have shown
Regularization parameter that this parameter has significant impact on the performance of SVM. Understandably, no specific value
Porosity has been recommended for it. This paper proposes a stacked generalization ensemble model of SVM
Permeability that incorporates different expert opinions on the optimal values of this parameter in the prediction of
porosity and permeability of petroleum reservoirs using datasets from diverse geological formations.
The performance of the proposed SVM ensemble was compared to that of conventional SVM technique,
another SVM implemented with the bagging method, and Random Forest technique. The results showed
that the proposed ensemble model, in most cases, outperformed the others with the highest correlation
coefficient, and the lowest mean and absolute errors. The study indicated that there is a great potential
for ensemble learning in petroleum reservoir characterization to improve the accuracy of reservoir prop-
erties predictions for more successful explorations and increased production of petroleum resources. The
results also confirmed that ensemble models perform better than the conventional SVM implementation.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction instances of the base learner and its associated hypotheses. The ensemble learn-
ing paradigm has gained much ground with classification problems in many fields.
The ensemble learning paradigm is the most recent Computational Intelligence However, it is still a new technology whose great benefit is still waiting to be tapped
(CI) tool for combining a “mixture of experts”. It has proved to be relevant in in the petroleum industry. The ensemble learning methodology is a close emulation
solving most challenging industrial problems. Its superior performance over the of the human socio-cultural behavior of seeking several people’s opinions before
conventional method of learning individual techniques has been confirmed when making any important decision [1]. With the reports of the successful application
applied on classification and regression problems. Its superior performance over the of ensemble modeling over their individual base learners in other areas [2–6], the
conventional method of learning individual techniques has been confirmed when petroleum industry is in dire need of this new modeling approach in the petroleum
applied on classification and regression problems. The ensemble learning paradigm reservoir characterization business.
is an advancement in the supervised machine learning technology. While the lat- A lot of data is being generated and acquired in the petroleum industry due to
ter searches for the best hypothesis among all possible hypotheses that describe the proliferation of various sensor-based logging tools such as Wireline, Open-Hole,
the solution to a problem, the former combines the best hypothesis of different Logging-While-Drilling, Measurement-While-Drilling, and seismic measurements
of increasing dimensions. Due to the high dimensionality that may be involved in
the data acquired through these systems, the ensemble methodology is most ideal
for extracting useful knowledge out of them without compromising expert opinions
∗ Corresponding author. Present address: The Research Institute, Center for and model performance. For those outside the facilities that may not have access
Petroleum & Minerals, King Fahd University of Petroleum & Minerals, Dhahran to these voluminous data, the ensemble methodology is still the ideal technique
31261, Saudi Arabia. Tel.: +966 138604383; mobile: +966 532649740. to manage the little data that may be available to them. The ensemble learning
E-mail addresses: fanifowose@gmail.com (F. Anifowose), ljane@fit.unimas.my methodology is ideal for handling both cases of too much data and too little data
(J. Labadin), aazeez@kfupm.edu.sa (A. Abdulraheem). [7]. Ensemble models have the capability to combine different architectures of their

http://dx.doi.org/10.1016/j.asoc.2014.10.017
1568-4946/© 2014 Elsevier B.V. All rights reserved.
484 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

base models, diverse data sampling methodologies, different recommended best of individuals, provided the individuals have reasonable compe-
input features, and various optimized parameters obtained from different experts tence [1]. This also explains why most human activities are usually
in the optimization of estimates and predictions of petroleum reservoirs properties.
implemented using the committee system.
CI techniques have been well applied in the petroleum industry over the years,
especially in reservoir characterization, but with a pace that does not match up The ensemble methodology was originally applied on classifica-
with the rate of advancement and the dynamics of the technology. Interestingly, tion and clustering problems which include bio-informatics [20],
researchers in the petroleum industry have moved progressively from the use of object detections [21,22], gene expressions [1], protein synthesis
empirical correlations and linear regression models to the use of CI and machine [3] and later extended and applied to time series prediction prob-
learning techniques [8]. However, the application of CI in the petroleum industry
have mainly been limited to Artificial Neural Networks (ANN) and Fuzzy Logic [9–12]
lems [23,24]. With the ensemble methodology, the selection of
with very few work in the area of hybrid CI modeling [13–18] but almost nothing the overall best hypothesis helps to improve the performance of
yet in the application of ensemble models. a model and reduces the likelihood of an unfortunate selection
Reservoir characterization, an essential process in the petroleum industry for of a poor model. This resembles the way humans solve prob-
estimating various properties of petroleum reservoirs, needs the ensemble learn-
lems. Having a committee of experts reduces the risk of taking a
ing methodology to improve the accuracy of predictions that are important for the
qualitative and quantitative evaluation of petroleum reserves to further increase the wrong decision on a problem. The ensemble methodology makes
consequent success of exploration, drilling and production activities. A marginal the selection of such candidate models (representing the best
increase in the prediction accuracy of these petroleum properties is capable of hypotheses) more confident, less risky and unbiased. A generalized
improving the efficiency of exploration, drilling and production of petroleum flowchart for ensemble techniques is shown in Fig. 1. The ensemble
resources with less time and effort.
methodology starts with the conventional identification of training
In this study, we propose an ensemble model of support vector machines (SVM)
based on the diversity exhibited by the regularization parameter. The regularization and testing data subsets. A finite number of base learners is then
parameter is an important parameter to tune and optimize the SVM model during established. Each base learner is then constructed with the desired
the training process. The major motivation for this study is the continued quest for diversity such as using different input features, random sampling of
better predictions of reservoir properties and the various reports, in other fields of
the input data or different values of a tuning parameter. The indi-
application, of superior performance of ensemble techniques over their individual
base learners. This paper is aimed at achieving the following objectives: vidual results produced by the base learners are then combined
using a relevant combination algorithm to evolve a single ensemble
• To review the application of ensemble techniques, especially in petroleum reser- solution.
voir characterization. As a methodology for classification and clustering problems, it
• To establish a premise for the typical need of ensemble techniques in petroleum
was successfully implemented in the Adaptive Boosting (AdaBoost)
engineering.
• To investigate the applicability of the stacked generalization ensemble model of technique. It was later extended for regression problems in the form
Support vector machines based on different expert opinions on the regularization of Bootstrap Aggregate method, abbreviated as bagging, and imple-
parameter. mented in the Random Forest technique [25]. Bagging involves
• To investigate the possible outperformance of the proposed SVM ensemble model
training each of the ensemble base learners on a subset that is ran-
over the conventional bagging method and the Random Forest technique using
domly drawn from the training data with replacement while giving
representative reservoir porosity and permeability datasets.
• To demonstrate the superiority of the bagging-based SVM ensemble model over each data sample equal weight [26].
the Random Forest technique. The major motivation for the ensemble learning paradigm is the
• To confirm whether the performance of the ensemble technique is better than or statistically sound argument that the paradigm is part of human
as good as the best from among its individual base models. daily lives: We ask the opinions of several experts before making a
decision: we seek the opinions of several doctors before accepting
This is the first study to address the applicability of the ensemble learning
paradigm in the prediction of petroleum reservoir properties especially porosity
a medical procedure; we read user reviews before choosing a web
and permeability. By demonstrating the superior performance of the proposed hosting service provider; we evaluate reports of references before
regularization parameter-driven SVM ensemble learning model over that of the con- taking new employees; manuscripts are reviewed by experts before
ventional bagging method, this study is expected to boost the interest of petroleum accepting or rejecting them; etc. In each case, the primary objective
engineering researchers in this new learning paradigm as it promises to open win-
is to minimize the error associated with the final decision that will
dows of possibilities in its application in petroleum reservoir characterization.
The rest of this paper is organized as follows: Section 2 presents a rigorous be made by combining the individual decisions [7].
review of literature on the ensemble methodology, its application in reservoir char-
acterization, overview of the SVM technique, and the effect of the regularization
parameter on its performance. Section 3 discusses the architecture of the three
2.2. Ensemble learning in reservoir characterization
ensemble models implemented in this study. Section 4 describes the methodol-
ogy employed in this study from data description through the evaluation criteria
to the details of the implementation of the ensemble models. Results are presented Petroleum reservoir characterization is the process of estimat-
and discussed in Section 5 while conclusions on the results are drawn in Section 6. ing and predicting various reservoir properties for use in full-scale
reservoir models for the determination of the quality and quantity
2. Literature survey of a petroleum reservoir. Some of the reservoir properties that are
of interest to Petroleum Engineers include porosity, permeability,
2.1. Ensemble learning methodology water saturation, pressure, volume, temperature, oil and gas ratio,
bubble point pressure, dew point pressure, well-bore stability, dia-
The ensemble learning methodology combines multiple “expert genesis and lithofacies. Out of these, porosity and permeability are
opinions” to solve a particular problem [19]. Each opinion is rep- the most important as they jointly serve as key indicators of reser-
resented in each instance of the base learners that make up the voir quality. The accuracy of almost all other properties depends on
ensemble model. In regression tasks, each instance attempts to the accuracy of these two properties.
search for the best hypothesis that solves the problem. In ensem- Porosity is the percentage of pores in core samples that are usu-
ble learning, the best hypotheses identified by the base learners are ally extracted from a petroleum reservoir during a coring process.
combined using any of the various combination rules to evolve the The process involves using specialized devices to take cylindrical
best overall solution offered by the ensemble model. This method- samples of rocks at intervals of about one foot for laboratory mea-
ology of combining the opinions of different “experts” to obtain an surements. The higher the percentage of pores in a rock sample,
overall “ensemble” decision is deeply rooted in human culture such the more will be its ability to hold hydrocarbons, water and gas.
as in the classical age of ancient Chinese and Greece and formal- Permeability is a measure of how the individual pores in the core
ized during the Enlightenment with the Condorcet Jury Theorem samples are interconnected. No direct relationship has been uni-
that proved that the judgment of a committee is superior to those versally established between these two properties especially in
F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 485

model development offered by the ensemble learning technique


has not been adequately utilized.
Start

2.3. Related work on ensemble learning

Most of the applications of ensemble methodology are found


in classification and clustering tasks. Kim et al. [31] showed that
* Training Sets even though SVM has been reported to provide good generaliza-
* Tesng Sets tion performance, often the classification results are far from the
* N Base Learners theoretical expectations. Based on this premise, they proposed an
ensemble model of SVM and tested it on IRIS classification, hand-
written recognition and fraud detection datasets. The reported
results showed that the proposed ensemble SVM outperformed the
single SVM in terms of classification accuracy. Another ensemble
* Use training dataset to train base learner, n model of SVM was proposed by Chen et al. [21] to detect the occur-
* Test the training learner with test dataset rence of road accidents. They reported that the ensemble model
outperformed the single SVM model. Sun and Li [6] reported a
significantly superior performance of an SVM ensemble over an
individual SVM model in the prediction of financial distress.
In the bio-informatics, Peng [32] and later Chen and Zhao [33]
Produce Hn presented an ensemble of SVM and ANN classifiers respectively
for the classification of microarray gene data. They both reported
that their respective ensemble techniques performed better than
using the single SVM and ANN techniques. Nanni and Lumini [3]
proposed an ensemble of SVM classifiers for the prediction of bac-
No terial virulent proteins using features that were extracted directly
n = N? from the amino acid sequence of a given protein rather than those
from the evolutionary information of a given protein as it is usually
done in the literature. Despite their deviation from the well known
feature source, they showed that the ensemble model performed
better than a single SVM model used on the conventional feature
extraction method. Similarly, positive conclusions were given by
Valentini et al. [34] and Caragea et al. [20] about their SVM ensem-
ble models for cancer recognition and Glycosylation site prediction
respectively.
Other interesting areas of application of ensembles include
hydrogeology [5], time series forecasting [23], customer churn pre-
diction [35], control systems [36], Soil Science [37], detection of
concept drift [38], and short-term load forecasting [39]. Other base
learners used apart from ANN and SVM include Neuro-Fuzzy Infer-
Best Hn ence System [40], Bayesian Inference [41], Fuzzy Inference Systems
[42], Decision Trees [43], and Extreme Learning Machines [44].
Despite the ample reports of the successful application of the
ensemble learning paradigm in literature, the benefits offered by
the ensemble learning paradigm has not been harnessed in the
prediction of porosity, permeability, and other petroleum reservoir
End properties. This paper is expected to serve as a motivating factor for
its appreciation, acceptance and continued application of this new
methodology in the petroleum industry.
Fig. 1. A generalized flowchart for ensemble techniques.

2.4. Support vector machines and the regularization parameter

carbonate geological formations. Hence, if a rock sample is very SVM is a set of related supervised machine learning methods
porous, it may not necessarily be of high permeability [27]. used for classification and regression. It belongs to the family of
Before the CI technology was embraced in the petroleum indus- Generalized Linear Classifiers. It can also be considered as a spe-
try, these properties used to be calculated from various empirical cial case of Tikhonov Regularization as it maps input vectors to
correlations and then followed by the use of linear regression tech- a higher dimensional space where a maximal separating hyper-
niques. Presently, the concept of Hybrid Computational Intelligence plane is constructed [45]. A conceptual framework of how SVM
(HCI) is gaining more popularity in the petroleum engineering works is shown in Fig. 2. Input datasets, especially those belong-
domain. As the quest for increased performance of predictive mod- ing to the non-separable case, are mapped to a higher dimensional
els in reservoir characterization continues to soar, the ensemble hyperplane where classification and regression becomes easier. The
methodology offers a great potential for developing better per- hyperplane is then optimized to evolve a solution to a problem. The
forming and more robust predictive models. Despite the reasonable generalization capability of SVM is ensured by special properties
number of successful applications of CI and HCI techniques in reser- of the optimal hyperplane that maximizes the distance to training
voir characterization [8,28–30], the great opportunities of robust examples in the high dimensional feature space.
486 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

Structural risk Optimal


minimization hyperplane

High dimensional mapping

Maximal Predicon Plane

Trainin Dataset

Raw Data Plane

Fig. 2. Mapping data to a high dimensional plane in SVM.

SVM was initially introduced for the purpose of classification which will lead to underfitting. Joachims [52] posited that “a small
when Vapnik et al. [46] developed a new ␧-sensitive loss function value for C will increase the number of training errors, while a large
technique that is based on statistical learning theory, and which C will lead to a behavior similar to that of a hard-margin SVM” but
adheres to the principle of structural risk minimization, seeking to did not give any recommendation. Shawe-Taylor and Cristianini
minimize an upper bound of the generalization error. The new tech- [53] suggested that the value of this parameter should be varied
nique that emerged out of this modification is known as Support “through a wide range of values”, again without giving any spe-
Vector Regression (SVR). It depends only on a subset of the training cific recommended value. In order to avoid any wrong assumption
data, because the cost function for building the model ignores any or giving a false recommendation, Kecman [51] and Cherkassky
training data close to the model prediction (within a threshold ␧). and Mullier [54] simply advised users to “carefully select the best
It has been shown to exhibit excellent performance in prediction value”. But “how do users carefully select the best value for this
tasks. Some of the kernel functions used in SVM are: parameter?” remains an unanswered question.
Linear: The aforementioned arguments are the various and diverse
expert opinions on the regularization parameter, C, ranging from
k(x, x ) = x ∗ x (1)
being a critical factor to being an important parameter whose value
RBF: needs to be chosen carefully [50,51]. Though, these expert proposi-
tions are enough to establish the existence of the required diversity
k(x, x ) = e−param∗x−x 

(2) on the regularization parameter, we still went ahead to present
Polynomial: an experimental proof of the diversity (to be discussed in Section
param
4.3). Given the importance of this parameter to the performance
k(x, x ) = (x ∗ x + 1) (3) of SVM, the most reasonable way to optimize it is to incorporate
where param is the kernel parameter in the SVM feature space [47]. all the expert views in an ensemble model and let the model itself
More details about the theoretical basis of SVM can be found in combine the results of the propositions to proffer the best solution
Burges [45,47] while cases of successful applications can be found to the problem. This will serve the purpose of being focused on
in Abe [13,48,49]. solution rather than the techniques.
Among the SVM parameters that must be set appropriately for The major justifications for proposing this novel ensemble algo-
optimal performance is the regularization parameter, C. It is also rithm include the diversity in the opinions of experts on the best
known in literature as the penalty factor [50] and the penalty value to assign to this parameter and the effect of this parameter
parameter [51]. The regularization parameter is one of the impor- on the performance of SVM (as further discussed in Section 4.3).
tant tuning parameters of SVM whose value could have a great Diversity is a major requirement for ensemble learning [55,56].
effect on the SVM model performance. It needs to be chosen dili-
gently to avoid the problems of overfitting and underfitting. It helps 3. Architecture of the SVM ensemble models
to maintain a balance between training error and model complex-
ity. It is so sensitive to the performance of the model that its value 3.1. Our proposed SVM ensemble model
needs to be chosen carefully. Kecman [51] considered it as one of
the two most important learning parameters that can be utilized For the proposed ensemble model, we implemented a stacked
in constructing SVMs for regression tasks. A lot has been written generalization [7] of the SVM technique. The reason for choos-
about its sensitivity and importance but no fixed value has been ing this architecture is simply that it has not been applied to any
known to be universally recommended for it. It turns out, just like petroleum engineering problem. This is a case of an existing method
the other machine learning-based techniques, to be controlled by with a new application. A conceptual framework of the proposed
the nature of the available data. algorithm is shown in Fig. 3. Given the perceived diversity in the
Alpaydin [50], while proving that the regularization parameter effect of the regularization parameter on the performance of SVM,
is critical to the performance of the SVM model, recommended that we set up 10 instances of SVM as base learners viz. Model 1, Model
a “proper” value is chosen for it. Due to the difficulty in doing this, he 2, . . ., Model 10. Each learner uses a different value of C viz. C1 , C2 ,
only cautioned that choosing a number that is too large may result . . ., C10 and the commonly used values for the other tuning param-
in a high penalty for non-separable points which will lead to over- eters. The ensemble model was first trained using bootstrapped
fitting. Conversely, choosing a number that is too small may result samples of the training data combined with a value of Cn . This cre-
in the model not having enough strength to generalize on new cases ated the Tier 1 models, whose outputs were then used to train a Tier
F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 487

Input Data

Bootstrap + Diverse C

Trained SVM Trained SVM ... Trained SVM


Tier-1 Models

Meta-Model
Tier-2 (Meta) Model
SVM with

Final
Decision

Fig. 3. Conceptual framework of our proposed ensemble model.

2 model (meta-model). In essence, the Tier-2 model combined the high-variance noise using a moving average filter that averages
outputs of the Tier-1 models. Hence, there was no need for a sepa- each sample of the data over all available samples. The noise com-
rate combination algorithm as applies in the conventional bagging ponent will be averaged out while the information content of the
method. entire data is unaffected by the averaging operation [58]. When the
The underlying idea of this architecture was to ensure that the prediction errors made on the data samples are averaged out, the
Tier-1 models were properly learned from the examples in the error of the overall output is reduced. Prediction errors are com-
training data. This is justified by a scenario in which a particular posed of two controllable components: the accuracy of the model
model incorrectly learned a certain region of the input data fea- (bias); and the precision of the model when trained on different
ture space, and hence consistently failed to correctly predict the training sets (variance). Therefore, since averaging has a smoothing
instances coming from that data region. Then the Tier-2 model will (variance-reducing) effect, the goal of the bagging-based ensem-
be able to learn this behavior. Along with the learned behaviors of ble systems is to create several classifiers with relatively fixed (or
the other models, the Tier-2 model can correct such improper train- similar) bias and then use the averaging combination rule on the
ing process. The proposed ensemble technique worked according individual outputs to reduce the variance [58]. This is the statistical
to the following procedure: justification for the bagging method.
The SVM ensemble with the bagging methodology was imple-
Algorithm 1. mented using the following procedure:
1. Set up the SVM model with the other fixed parameters (epsilon, lambda,
kernel and step size). Algorithm 2.
2. Do for n = 1 to 10//there are 10 instances of C to create the Tier-1 models 1. Start SVM with all parameters set as optimal as possible.
3. Set Cn to an assigned value 2. Set N to the number of desired iterations.
4. Randomly divide data into training and testing 3. Set T to the desired percentage of data for bootstrapped training data.
5. Use the training data to train the SVM model, Sn , using Cn 5. Do for n = 1 to N
6. Use the test data to predict the target (porosity and permeability) 6. Randomly extract T% of the data for training
7. Keep the output of the above as Hypothesis, Hn 7. Use the training data to train the SVM model, Sn
8. Continue 8. Use the remaining (100 − T)% test data to predict the target variables
9. All hypotheses, Hn where n = 1, 2, . . ., 10 become input to Tier-2 model 9. Keep the result of the above as Hypothesis, Hn
10. Tier-2 model is trained with the combined hypotheses. 11. Continue
11. Tier-2 model outputs the final decision. 12. Compute the average of all Hypotheses,
Hfinal(x) = arg max
n j (x)
using the Mean( ) rule:

n
3.2. Conventional ensemble method
1
j (x) = n
Hn (x)
The conventional ensemble method for regression tasks is 1

the Bagging [25] and implemented in the Random Forest tech-


nique [57], an ensemble of Classification And Regression Trees
(CART). Its counterpart for classification is called Boosting and was 3.3. The Random Forest technique
implemented in Adaboost technique. In the bagging method, the
contribution of each base learner in the ensemble model is given Random Forest is an ensemble learning-based technique that
an equal weight. To improve model variance, bagging trains each consists of a bagging of un-pruned Decision Tree learners [25] with
model in the ensemble using a subset that was randomly drawn a randomized selection of input data samples and predictors. The
from the training set with replacement. The results from the base algorithm [59] is based on the bagging technique developed by
learners are then averaged over all the base learners to obtain the Breiman [25] and the randomized feature selection by Ho [60,61].
overall result of the ensemble model. Random Forest begins with building a Tree and then grows more
The main concept of using the bagging method to increase Trees using a bootstrap subsample of the data until the minimum
the prediction accuracy of ensembles is similar to reducing a node is reached in order to avoid overfitting that comes with larger
488 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

Table 1 The R-Square is a statistical measure of how strong a relationship


Predictor variables for site 1 well logs for porosity.
is between n pairs of two variables, x and y. It is expressed as:
Predictor variables for porosity     
n xy − x y
1 Core
R-Square =     (4)
2 Top Interval    2    2
3 Grain density n x2 − x n y2 − y
4 Grain volume
5 Length The RMSE is a measure of the spread of the actual x values around
6 Diameter the average of the predicted y values. It is expressed as:

n
(x
i=1 i
− yi )2
RMSE = (5)
n
number of Trees. More details about Decision Trees can be found
The MAE is the average of the absolute errors of the predicted y
in Sherrod [62] and application cases can be found in Park et al.
values relative to the actual x values. It is given by:
[63] and Leibovici et al. [64]. Random Forest has been shown to be
1
effective and accurate [65] but with reports of possible overfitting n

[66–68] hence liable to perform poorly in prediction tasks. MAE = |xi − yi | (6)
n
The algorithm of Random Forest is presented [59] as follows: i=1

The R-Square, RMSE and MAE were used to evaluate the per-
Algorithm 3.
formance of the proposed ensemble model and the conventional
1. Starting with a tree:
SVM technique. For the conventional bagging ensemble model, we
a. Set N = number of training cases.
b. Set M = number of features. used the Mean(R-Square), Mean(RMSE) and Mean(MAE) to obtain
c. Select a subset m of input variables such that m  M. the overall performance (presented in Algorithm 2). Random Forest
d. Do for n = 1 to N has its Mean( ) rule for its models combination already embedded in
i. Train this tree with a bootstrap sample of the training data.
the algorithm (presented in Algorithm 3). Our comparative analysis
ii. Use the rest of the cases to estimate the prediction error of the tree.
iii. Replace the bootstrap sample.
of the three ensemble models was based on these sets of evaluation
Continue criteria.
e. Calculate the best split based on these m variables in the training set.
2. The above procedure is iterated over all trees in the ensemble. 4.3. Diversity measures
3. Calculate the mean of the performance of all trees. This represents the
performance of the Random Forest technique.
The major requirement for the implementation of ensemble
learning paradigm is the existence of diversity in the system
[7,56,58]. Due to the importance of diversity [69,70], we used a
4. Research methodology
number of measures to ensure that our proposed ensemble is valid.
Most of the diversity measures that were proposed for ensem-
4.1. Description of data
ble learning in literature were mainly for classification [71–73].
Since our work is on regression, we considered those proposed
For the design, testing and validation of our proposed ensem-
by Dutta [74]: correlation coefficient, covariance, chi-square, dis-
ble model, three porosity datasets and three permeability datasets
agreement measure, and mutual information entropy. However,
were used. The porosity datasets were obtained from a petroleum
we observed that the first three measures are related to each other.
reservoir in the Northern Marion Platform of North America (Site 1)
Also disagreement measure is exclusively for classification ensem-
while the permeability datasets were obtained from a reservoir in
bles similar to that proposed by Kuncheva and Whitaker [75].
the Middle East (Site 2). The datasets from Site 1 have six predictor
Hence, in order to avoid redundancy, we selected diversity cor-
variables for porosity, while the datasets from Site 2 have eight pre-
relation coefficient (DCC) and mutual information entropy (MIE).
dictor variables for permeability. These are shown in Tables 1 and 2.
The DCC is the degree of closeness between any two of the base
learner outputs Ym and Yn such that:

4.2. Evaluation criteria i=1,...,N
(yim − Y m)(yin − Y n)
=   (7)
In order to effectively evaluate the performance of the base i=1,...,N
(yim − Y m)2 i=1,...,N
(yin − Y n)2
learners ahead of the ensemble models, we used the three
where Ym and Yn represent the continuous valued outputs of the
commonly used measure of model performance viz. correlation
models Rm and Rn . Ym and Yn are N-dimensional vectors with ym =
coefficient (R-Square), root mean-squared error (RMSE) and mean
y1m , y2m , . . ., yN
m and yn = yn , yn , . . ., yn .
1 2 N
absolute error (MAE) to evaluate the individual base learners.
The diversity of two predictors is inversely proportional to the
correlation between them. Hence, two predictors with low DCC
between them (high diversity) are preferred over those with high
Table 2 values. In this study, we defined high correlation to be greater than
Predictor variables for site 2 well logs for permeability. 0.7.
Predictor variables for permeability Full meaning MIE diversity measure is defined in terms of Eq. (7) as:
1
1 GR Gamma ray log
I(Y m , Y n ) = − log(1 − 2 ) (8)
2 PHIE Porosity log 2
3 RHOB Density log
4 SWT Water saturation
where , Ym and Yn remain as previously defined in Eq. (7).
5 RT Deep resistivity We also showed diversity by using graphical visualizations. We
6 MSFL Micro-spherically focused log plotted the effect of different values of regularization parameter
7 NPHI Neutron porosity log in terms of R-Square, RMSE and MAE to determine their degree
8 CALI Caliper log
of diversity. Those graphs that show a high degree of roughness or
F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 489

Table 3 Optimal Training/Testing Parameters for SVM


Division of datasets into training and testing. 0.85
Actual
Wells Porosity Permeability
Predicted
1 2 3 1 2 3
0.8
Data size 415 285 23 355 477 387
Training (70%) 291 200 16 249 334 271
Testing (30%) 124 85 7 106 143 116
0.75

non-smoothness would be an indication of high diversity. We were

CC
able to show pre-implementation and in situ diversity in Section
4.4, and post-implementation diversity in Section 5. 0.7

4.4. Implementing the ensemble models


0.65
The first step was to extract the various “expert opinions” on
the value of C. Since none of the previous studies used our datasets,
we could not use the same C values as we pointed out in Section
2.4. Hence, we derived a “digital” version of these opinions from our 0 1000 2000 3000 4000 5000 6000 7000 8000 9000
datasets. Taking into consideration all the divergent “expert views” C
in Section 2.4, we performed a sample run of a simple conven-
Fig. 5. R-Square of different values of C with porosity Well 2.
tional SVM model using all the parameters that worked well in our
previous studies with the same datasets and their 70:30 ratio strat-
Optimal Training/Testing Parameters for SVM
ification strategy [17,30,76] while varying the value of C between 1
the extreme of 0 and 10,000. Following the common practice in
petroleum engineering CI modeling, each dataset was divided into 0.9
70% training and 30% testing subsets using a randomized stratifica-
tion approach [27,77,78]. With this, 70% of each dataset was used 0.8
for training while the remaining 30% was used for testing. The train-
ing subset represents the cored section of the oil and gas reservoir 0.7
with complete log-core data while the testing subset represents
CC

the uncored section with only the log data available while the core 0.6
values are to be predicted. The division of the datasets is shown in
Table 3. The optimal settings for the other tuning parameters used 0.5
Actual
to determine the possible optimal values for C are: Predicted
0.4
• Error allowance, Lambda = 1e−7
• Penalty for overfitting, epsilon = 0.1 0.3
• Type of Kernel = Polynomial (Eq. (3))
• Kernel step size = 0.2 0.2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
C
The results of this expert opinion extraction process showing
the performance of the SVM model with respect to the C val- Fig. 6. R-Square of different values of C with porosity Well 3.
ues are shown in Figs. 4–9 for all the datasets respectively. The
Optimal Training/Testing Parameters for SVM
0.9
Optimal Training/Testing Parameters for SVM Actual
1
Predicted
Actual
Predicted
0.88
0.95

0.9 0.86

0.85
CC

0.84
CC

0.8
0.82

0.75

0.8
0.7

0.65
0.78
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
C C

Fig. 4. R-Square of different values of C with porosity wEll 1. Fig. 7. R-Square of different values of C with permeability Well 1.
490 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

Optimal Training/Testing Parameters for SVM Table 4


0.95 Division of datasets into training and testing.

Data reference Possible optimal C


0.9
Data 1 Well 1 900, 2800, 5600, 6800,
Data 1 Well 2 1500, 4500, 5000
0.85 Data 1 Well 3 1200, 3500, 5200, 7100, 8100
Data 2 Well 1 3600, 5100, 7600
Data 2 Well 2 500, 3500, 5900, 7000, 7700, 8200
0.8
Data 2 Well 3 800, 1800, 3700, 6500, 7600
CC

0.75
Actual
Table 5
Predicted
Division of datasets into training and testing.
0.7
Collapsed possible optimal C for ensemble 500, 900, 1500, 2800, 3500,
0.65 4500, 5600, 6800, 7700, 8200

0.6
our “expert opinions”. We used a customized version of the original
MATLAB-based Least-Square SVM code available in [83]. The choice
0.55
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 of this version of SVM is due to reports of its excellent performance
C in literature [84,85].
After implementing the ensemble models as detailed in the algo-
Fig. 8. R-Square of different values of C with permeability Well 2. rithms, we carried out the in situ diversity test using the measures
defined in Section 4.3. The base learners were paired consecutively
figures showed that there is a lot of diversity in choosing dif- and the diversity measures were applied on their predicted out-
ferent values for the regularization parameter as the SVM model puts. The results of this test are shown in Table 6. Other pairs were
behaved differently with each C value with respect to each dataset. also considered but all gave results that are very close to those dis-
Some points clearly showed the occurrence of overfitting. With played. From the results, both the DCC and MIE of the paired base
this sample run, we were able to establish and confirm the exist- learners are low. This is another confirmation of the existence of
ence of pre-implementation diversity. With our observation of this diversity in our proposed ensemble model. We used the average of
pre-implementation diversity of the effect of the regularization all the C values (4200) for the Tier-2 meta-model. The outputs of
parameter in our sample run, the assertion of Cherkassky and Ma the Tier-1 models were combined using the Tier-2 model. With this,
[79] that this parameter has negligible effect on the generalization the training outputs of the Tier-1 models became the training input
performance needs to be further re-considered. data for the Tier-2 model with the original target values remaining
From the pre-implementation diversity test results, the values unchanged. The testing outputs of the Tier-1 models also became
of C corresponding to the points of optimal performance of the SVM the testing input data for the Tier-2 model with the original testing
sample run were extracted using the criterion of least overfitting – target remaining kept for evaluating the generalization capability
points with the least separation between the training and testing of the ensemble model. We then used the R-Square, RMSE and MAE
lines while maintaining optimality in generalization performance. criteria to obtain the overall performance of the ensemble model
Table 4 shows the extracted values of C meeting the stated criterion. for the purpose of comparison.
The extracted values were then collapsed to the 10 shown in Table 5 For the conventional bagging technique, we also created 10
to ensure simplicity of implementation [80–82] of the proposed instances of SVM and gave a bootstrap sample of each dataset to
ensemble model. We finally applied the six datasets, in turn, on our each instance for training. The remaining samples were used for
proposed SVM ensemble with the identified values of C serving as testing the trained instances. The same LS-SVM code [83] but cus-
tomized with other user-defined functions was used to implement
the bagging algorithm. The performance of this model was also
Optimal Training/Testing Parameters for SVM measured using the R-Square, RMSE and MAE of the individual
0.86
Actual
results to obtain the average for the overall performance of the
0.84 Predicted ensemble model.
For the Random Forest, we customized the algorithm found in
0.82 MATLAB CENTRAL [86] using other functions and toolboxes from
0.8
the NETLAB repository [87]. Following the usual pattern, the perfor-
mance of the algorithm with each dataset was measured using the
0.78 R-Square, RMSE and MAE of the individual results. Since this is the
fundamental ensemble algorithm, the Mean( ) combination rule has
CC

0.76
been internally applied on the output of the algorithm which auto-
0.74 matically represents the overall performance of the ensemble Tree
model. So, we did not make any modification to the combination
0.72 segment of the code.
After the implementation of the algorithms, we performed the
0.7
post-implementation diversity test. This was done by visualizing
0.68 the performance of each base learner of our proposed ensemble
model with respect to the evaluation criteria. The graphical visu-
0.66 alizations are shown in Figs. 10–15 for all the datasets. From the
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
C R-Square (CC), RMSE and MAE performance plots, the performance
of the Tier-1 models was uniquely but consistently different from
Fig. 9. R-Square of different values of C with permeability Well 3. each other. The general irregularity non-smoothness of the plot
F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 491

Table 6
Results of correlation and entropy diversity tests for all datasets.

Diversity measure Datasets Base learners

1 2 3 4 5 6 7 8 9 10

Por 1 0.193 −0.075 0.346 0.176 0.069


Por 2 0.249 0.145 0.091 0.236 0.189
Por 3 −0.629 0.173 0.202 0.350 0.077
Correlation
Perm 1 0.616 0.661 0.673 0.676 0.461
Perm 2 0.308 0.284 0.438 0.181 0.369
Perm 3 0.686 0.088 0.699 0.676 0.379

Por 1 0.008 0.001 0.028 0.007 0.001


Por 2 0.014 0.005 0.002 0.012 0.008
Por 3 0.109 0.154 0.009 0.028 0.150
Entropy
Perm 1 0.104 0.125 0.131 0.192 0.052
Perm 2 0.022 0.018 0.046 0.007 0.032
Perm 3 0.209 0.230 0.146 0.218 0.172

lines is a confirmation of the heterogeneity in the performance of the bagging method and the Random Forest technique, the results
the Tier-1 models, hence confirming the existence of high diversity. obtained are shown in Fig. 16 through 20. In terms of the R-Square
The result of these three levels of diversity test (pre-, in situ, and criterion, Fig. 16 shows that our proposed stack generalization
post-implementation) is an indication that SVM with the diverse ensemble model outperformed the others with the highest corre-
regularization parameter is an ideal candidate for ensemble mod- lation in most of the six cases. In particular, our proposed model
eling. proved to be superior to the conventional SVM and Random Forest
techniques on all the datasets. However, the proposed model out-
5. Modeling results and discussion performed the SVM bagging method in four cases (Data 1 Well 1,
Data 1 Well 2, Data 2 Well 1 and Data 2 Well 3) while exhibiting
When the performance of our proposed ensemble model was stiff competitions in two cases (Data 1 Well 3 and Data 2 Well 2).
compared with those of the conventional SVM model, SVM with Between SVM Bagging and Random Forest, the former proved to be

Comparative CC for the SVM Ensemble Comparative RMSE for the SVM Ensemble Comparative MAE for the SVM Ensemble
1 11 3

10 2.8
0.95

2.6
9
0.9
2.4
8
0.85 2.2
RMSE

MAE
CC

7
0.8 2
6
1.8
0.75
5
1.6

0.7 4 1.4

0.65 3
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Base Learners Base Learners Base Learners

(a) (b) (c)

Fig. 10. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 1 Well 1.

Comparative CC for the SVM Ensemble Comparative RMSE for the SVM Ensemble Comparative MAE for the SVM Ensemble
0.9 8 6.4

0.88 7.8 6.2

0.86
7.6 6

0.84
7.4 5.8
0.82
7.2 5.6
RMSE

MAE
CC

0.8
7 5.4
0.78
6.8 5.2
0.76
6.6 5
0.74

0.72 6.4 4.8

0.7 6.2 4.6


1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Base Learners Base Learners Base Learners
(a) (b) (c)

Fig. 11. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 1 Well 2.
492 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

Comparative CC for the SVM Ensemble Comparative RMSE for the SVM Ensemble Comparative MAE for the SVM Ensemble
1 5.5 4

5
3.5
0.98
4.5
3
0.96 4

3.5 2.5

RMSE

MAE
CC

0.94
3 2

0.92 2.5
1.5
2
0.9
1
1.5

0.88 1 0.5
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Base Learners Base Learners Base Learners
(a) (b) (c)

Fig. 12. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 1 Well 3.

Comparative CC for the SVM Ensemble Comparative RMSE for the SVM Ensemble Comparative MAE for the SVM Ensemble
0.87 0.76 0.62

0.86 0.74 0.6

0.85 0.72 0.58


RMSE

MAE
CC

0.84 0.7 0.56

0.83 0.68 0.54

0.82 0.66 0.52

0.81 0.64 0.5


1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Base Learners Base Learners Base Learners
(a) (b) (c)

Fig. 13. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 2 Well 1.

better in five cases out of the six. The reason for the lower R-Square for porosity datasets and Fig. 18 for permeability. Our proposed
of SVM Bagging than that of the Random Forest on the Site 1 Well 2 ensemble model gave the least RMSE for all the datasets. It was
could not be explained at the moment. We treated it as a lone and interesting, however, to observe that the better performance of
rare case that needs to be further investigated in our future work Random Forest in terms of R-Square over SVM bagging with Site 1
with more and different datasets. Well 2 in Fig. 16 has been nullified by the lower RMSE of SVM Bag-
With the RMSE criterion, the same trend of the superior perfor- ging than Random Forest (Fig. 17). That does not however preclude
mance of our proposed model over the others is shown in Fig. 17 the necessity of conducting more studies with more and different

Comparative CC for the SVM Ensemble Comparative RMSE for the SVM Ensemble Comparative MAE for the SVM Ensemble
0.8 0.86 0.7

0.79 0.68
0.84

0.78
0.66
0.82
0.77
0.64
RMSE

MAE
CC

0.76 0.8
0.62
0.75
0.78
0.6
0.74

0.76
0.73 0.58

0.72 0.74
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Base Learners Base Learners Base Learners
(a) (b) (c)

Fig. 14. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 2 Well 2.
F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 493

Comparative CC for the SVM Ensemble Comparative RMSE for the SVM Ensemble Comparative MAE for the SVM Ensemble
0.84 0.95 0.72

0.82 0.7

0.9
0.8 0.68

0.78 0.66
0.85
0.76 0.64

RMSE

MAE
CC

0.74 0.62
0.8
0.72 0.6

0.7 0.58
0.75

0.68 0.56

0.66 0.7 0.54


1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Base Learners Base Learners Base Learners

(a) (b) (c)

Fig. 15. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 2 Well 3.

sets of data to further investigate the comparative performance of with the SVM bagging method. Hence, both SVM bagging and the
the two techniques. stacked generalization ensemble model proved to be better mod-
In terms of MAE, our proposed ensemble model showed supe- els for successful and improved petroleum reservoir modeling and
riority over the others for all datasets (Figs. 19 and 20). The SVM prediction than the conventional SVM model and the Random For-
bagging has lower errors than the Random Forest technique in five est technique. This further confirmed the report of the robustness
out of the six cases while there was stiff competition between SVM and scalability of SVM in handling data of different sizes and dimen-
bagging and the proposed ensemble model on all datasets except sionality [4,17,45,47,48,51,84] and the limitation of Decision Tree,
Data 1 Well 1. In the overall evaluation, we posit that our pro- which forms the basis of the Random Forest technique, in handling
posed model performed the best with few cases of stiff competition data of small size and high dimensionality [66–68].

Fig. 16. R-Square performance of the 3 techniques (all datasets).


Fig. 18. RMSE comparative performance of the 3 techniques (permeability datasets).

Fig. 17. RMSE comparative performance of the 3 Techniques (porosity datasets). Fig. 19. MAE comparative performance of the 3 techniques (porosity datasets).
494 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

gain immensely from the application of this new advancement


in machine learning technology. It will be well appreciated in the
petroleum industry since a marginal improvement in the prediction
of reservoir properties, especially porosity and permeability, will
lead to huge increase in the successful exploration, drilling tasks
and production of energy.
With the success of this study, we have been motivated to imple-
ment our proposed ensemble algorithm in Decision Trees using
different number of tree splits as the basis for diversity in the face
of diverse “expert opinions”. The result of that will be compared
with the existing bagging method implemented in Random For-
est. Application of ensemble possibilities of other techniques in
petroleum reservoir characterization and modeling will be inves-
tigated.

Fig. 20. MAE comparative performance of the 3 techniques (permeability datasets). Acknowledgment

The authors would like to thank the Universiti Malaysia Sarawak


6. Conclusion
and King Fahd University of Petroleum and Minerals for providing
the resources used in the conduct of this study.
In this study, we proposed a novel application of the stacked
generalization ensemble model of SVM based on diverse regular-
ization parameter. Different views of experts on the importance References
and sensitivity of this parameter of SVM, giving credence to the
[1] M. Re, G. Valentini, Simple ensemble methods are competitive with state-
need for great caution in setting its value, were presented. Using of-the-art data integration methods for gene function prediction, in: JMLR:
three levels of diversity test: pre-implementation, in situ, and post- Workshop and Conference Proceedings, Mach. Learn. Syst. Biol. 8 (2010)
implementation, we showed the importance of this parameter to 98–111.
[2] A. Chandra, X. Yao, Evolving hybrid ensembles of learning machines for better
the performance of the SVM model and used that to establish the generalisation, Neurocomputing 69 (2006) 686–700.
existence of the required diversity of “expert opinions” for the [3] L. Nanni, A. Lumini, An ensemble of support vector machines for predicting
implementation of our proposed ensemble technique. To prove virulent proteins, Expert Syst. Appl. 36 (2009) 7458–7462.
[4] M.L. Martin, D. Santos-Muñoz, F. Valero, A. Morata, Evaluation of an ensemble
the superior performance of our algorithm, we also implemented precipitation prediction system over the Western Mediterranean area, Atmos.
another SVM ensemble model that is based on the conventional Res. 98 (2010) 163–175.
bagging method. These two methods were compared with Ran- [5] I. Zaier, C. Shu, T.B.M.J. Ouarda, O. Seidou, F. Chebana, Estimation of ice thickness
on lakes using artificial neural network ensembles, J. Hydrol. 383 (3–4) (2010)
dom Forest, a traditional and fundamental ensemble technique 330–340.
that is based on the combination of several instances of Decision [6] J. Sun, H. Li, Financial distress prediction using support vector machines:
Trees and the conventional SVM technique. All the four techniques ensemble vs. individual, Appl. Soft Comput. 12 (8) (2012) 2254–2265.
[7] R. Polikar, Scholarpedia 4 (1) (2009) 2776.
were applied on six datasets of petroleum reservoirs obtained from
[8] J.K. Ali, Neural networks: a new tool for the petroleum industry? in: Proceedings
different geological formations in the prediction of porosity and of Society of Petroleum Engineers European Petroleum Computer Conference,
permeability. The results of the techniques were evaluated using Aberdeen, UK, 1994, pp. 217–231.
[9] L. Jong-Se, Reservoir properties determination using fuzzy logic and neu-
the standard correlation coefficient, root mean square error and
ral networks from well data in offshore Korea J. Pet. Sci. Eng. 49 (2005)
mean absolute error. Based on these metrics, we evaluated the 182–192.
comparative performance of the implemented techniques. [10] A. Abdulraheem, E. Sabakhi, M. Ahmed, A. Vantala, I. Raharja, G. Korvin, Estima-
A rigorous comparison of the three ensemble models with the tion of permeability from wireline logs in a middle eastern carbonate reservoir
using fuzzy logic, in: Proceedings The 15th SPE Middle East Oil and Gas Show
conventional SVM technique led to the following conclusions: and Conference, Bahrain, 11–14 March, 2007.
[11] D. Kaviani, T.D. Bui, J.L. Jensen, C.L. Hanks, The application of artificial intelli-
• The existence of high diversity in the use of SVM with diverse gence neural networks with small data sets: an example for analysis of fracture
spacing in the Lisbourne formation, northeastern Alaska, SPE J. Reserv. Eval.
regularization parameter presented it as a good candidate for Eng. 11 (3) (2008) 598–605.
ensemble modeling. [12] A. Khoukhi, M. Oloso, A. Abdulraheem, M. El-Shafei, Optimized adap-
• Our proposed stacked generalization ensemble model out- tive neural networks for viscosity and gas/oil ratio curves prediction,
in: Proceedings IASTED International Conference, Banff, Canada, 2010, pp.
performed the other ensemble models with few occurrences of 14–17.
stiff competition with SVM bagging. [13] S. Abe, Fuzzy LP-SVMs for multiclass problems, in: Proceedings of the European
• The observed best performance of our proposed model is based on Symposium on Artificial Neural Networks, 2004, pp. 429–434.
[14] S. Mohsen, A. Morteza, Y.V. Ali, Design of neural networks using genetic algo-
its demonstration of superiority in all the cases of the evaluation rithm for the permeability estimation of the reservoir, J. Pet. Sci. Eng. 59 (2007)
criteria. 97–105.
• On the average, the SVM ensemble with the conventional bagging [15] M.B. Shahvar, R. Kharrat, R. Mahdavi, Incorporating fuzzy logic and artificial
neural networks for building hydraulic unit-based model for permeability
algorithm performed better than the Random Forest technique. prediction of a heterogeneous carbonate reservoir, in: Proceedings Inter-
• We further confirmed in this study that the Random Forest tech- national Petroleum Technology Conference, Doha, Qatar, 7–9 December,
nique is susceptible to overfitting with small datasets. This agrees 2009.
[16] T. Weldu, S. Ghedan, O. Al-Farisi, Hybrid AI and conventional empirical model
with previous reports [66–68]. for improved prediction of log-derived permeability of heterogeneous carbon-
• The proposed ensemble model, like the conventional SVM, ate reservoir, in: Proceedings prepared of the Society of Petroleum Engineers
demonstrated the capability to handle both small and large Production and Operations Conference and Exhibition, Tunis, Tunisia, 8–10
June, 2010.
datasets. [17] F. Anifowose, J. Labadin, A. Abdulraheem, A hybrid of functional networks and
support vector machine models for the prediction of petroleum reservoir prop-
This study has successfully showed that there is great poten- erties, in: Proceedings of 11th International Conference on Hybrid Intelligent
Systems, IEEExplore, 2011, pp. 85–90.
tial for the proposed ensemble learning paradigm in petroleum [18] F. Anifowose, J. Labadin, A. Abdulraheem, Prediction of petroleum reser-
reservoir characterization and the petroleum industry stands to voir properties using different versions of adaptive neuro-fuzzy inference
F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 495

system hybrid models, Int. J. Comput. Inform. Syst. Ind. Manag. Appl. 5 (2013) [52] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods,
413–426. Theory and Algorithms, The Springer International Series in Engineering and
[19] L. Rokach, Pattern Classification Using Ensemble Methods, World Scientific Computer Science, 2002, pp. 40.
Publishing Co., 2009. [53] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge
[20] C. Caragea, J. Sinapov, A. Silvescu, D. Dobbs, V. Honavar, Glycosylation site pre- University Press, 2004.
diction using ensembles of support vector machine classifiers, BMC Bioinform. [54] V.S. Cherkassky, P. Mulier, Learning from Data: Concepts, Theory and Methods,
8 (438) (2007), doi:10.1186/1471-2105-8-438. 2nd ed., John Wiley & Sons, 2007.
[21] S. Chen, W. Wang, H. Zuylen, Construct support vector machine ensemble to [55] G. Brown, J.L. Wyatt, P. Tino, Managing diversity in regression ensembles, J.
detect traffic incident, Exp. Syst. Appl. 36 (2009) 10976–10986. Mach. Learn. Res. 6 (2005) 1621–1650.
[22] J. Wu, J.M. Rehg, Object detection, in: C. Zhang, Y. Ma (Eds.), Ensemble Machine [56] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag.
Learning: Methods and Applications, Springer Science+Business Media, LLC, 3 (2006) 21–45.
2012, http://dx.doi.org/10.1007/978-1-4419-9326-7 8. [57] A. Liaw, M. Wiener, Classification and regression by random forest, R News 2
[23] V. Landassuri-Moreno, J.A. Bullinaria, Neural network ensembles for time series (3) (2002) 18–22.
forecasting, in: Proceedings of GECCO’09, Montréal Québec, Canada, 2009, pp. [58] R. Polikar, Ensemble learning, in: C. Zhang, Y. Ma (Eds.), Ensemble Machine
8–12. Learning: Methods and Applications 8, Springer Science+Business Media, LLC,
[24] L. Yu, S. Wang, K.K. Lai, Forecasting crude oil price with an EMD-based neural 2012, pp. 1–34.
network ensemble learning paradigm, Energy Econ. 30 (2008) 2623–2635. [59] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
[25] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. [60] T.K. Ho, Random decision forest, in: Proceedings of the 3rd International Con-
[26] G. Liang, X. Zhu, C. Zhang, An empirical study of bagging predictors for different ference on Document Analysis and Recognition, Montreal, QC, 14–16 August,
learning algorithms, in: Proceedings of the Twenty-Fifth AAAI Conference on 1995, pp. 278–282.
Artificial Intelligence, San Francisco, California, USA, 2011, pp. 1802–1803. [61] T.K. Ho, The random subspace method for constructing decision forests, IEEE
[27] T. Helmy, F. Anifowose, K. Faisal, Hybrid computational models for the Trans. Pattern Anal. Mach. Intell. 20 (8) (1998) 832–844.
characterization of oil and gas reservoirs, Int. J. Exp. Syst. Appl. 37 (2010) [62] P. Sherrod, DTREG Predictive Modeling Software, 2008, 324 pp., available from:
5353–5363. www.dtreg.com
[28] S. Mohaghegh, S. Ameri, Artificial neural network as a valuable tool for [63] H. Park, S. Kwon, K. Hyuk-Chul, Complete gini-index text (GIT) feature-selection
petroleum engineers, in: Unsolicited paper for Society of Petroleum Engineers: algorithm for text classification, in: 2nd International Conference on Software
1–6, 1995. Engineering and Data Mining, 2010, pp. 366–371.
[29] F. Anifowose, A. Abdulraheem, A functional networks-type-2 fuzzy logic hybrid [64] D.G. Leibovici, L. Bastin, M. Jackson, Higher-order co-occurrences for
model for the prediction of porosity and permeability of oil and gas reservoirs, exploratory point pattern analysis and decision tree clustering on spatial data,
in: Proceedings 2nd International Conference on Computational Intelligence, Comput. Geosci. 37 (2011) 382–389.
Modeling and Simulation, IEEEXplore, 2010, pp. 193–198. [65] R. Caruana, N. Karampatziakis, A. Yessenalina, An empirical evaluation
[30] F. Anifowose, A. Abdulraheem, Fuzzy logic-driven and SVM-driven hybrid com- of supervised learning in high dimensions, in: Proceedings of the 25th
putational intelligence models applied to oil and gas reservoir characterization, International Conference on Machine Learning (ICML), Helsinki, Finland,
J. Nat. Gas Sci. Eng. 3 (3) (2011) 505–517. 2008.
[31] H. Kim, S. Pang, H. Je, D. Kim, S.Y. Bang, Constructing support vector machine [66] A. Altmann, L. Tolosi, O. Sander, T. Lengauer, Permutation importance: a cor-
ensemble, Pattern Recognit. 36 (2003) 2757–2767. rected feature importance measure, Bioinformatics 26 (10) (2010) 1340–1347.
[32] Y. Peng, A novel ensemble machine learning for robust microarray data classi- [67] H. Deng, G. Runger, E. Tuv, Bias of importance measures for multi-valued
fication, Comput. Biol. Med. 36 (2006) 553–573. attributes and solutions, in: Proceedings of the 21st International Conference
[33] Y. Chen, Y. Zhao, A novel ensemble of classifiers for microarray data classifica- on Artificial Neural Networks (ICANN), 2011, pp. 293–300.
tion, Appl. Soft Comput. 8 (2008) 1664–1669. [68] L. Tolosi, T. Lengauer, Classification with correlated features: unreliability of
[34] G. Valentini, M. Muselli, F. Ruffino, Cancer recognition with bagged ensembles feature ranking and solutions, Bioinformatics 27 (2011) 1986–1994.
of support vector machines, Neurocomputing 56 (2004) 461–466. [69] A. Tsymbal, M. Pechenizkiy, P. Cunningham, Diversity in search strategies for
[35] K.W.D. Bock, D.V. Poel, Ensembles of probability estimation trees for customer ensemble feature selection, Information Fusion 6 (1) (2005) 83–98, Special
churn prediction, in: N. Garcıa-Pedrajas, al. et (Eds.), IEA/AIE 2010, Part II, LNAI issue on diversity in multiple classifier Systems.
6097, Springer-Verlag, 2010, pp. 57–66. [70] L.L. Minku, A.P. White, X. Yao, The impact of diversity on online ensemble learn-
[36] D. Pardoe, M. Ryoo, R. Miikkulainen, Evolving neural network ensembles for ing in the presence of concept drift, IEEE Trans. Knowl. Data Eng. 22 (5) (2010)
control problems, in: Proceedings of the GECCO’05, Washington, DC, USA, 2005, 730–742.
pp. 25–29. [71] E.K. Tang, P.N. Suganthan, X. Yao, An analysis of diversity measures, Mach.
[37] L. Baker, D. Ellison, Optimisation of pedotransfer functions using an artificial Learn. 65 (2006) 247–271.
neural network ensemble method, Geoderma 144 (2006) 212–224. [72] T. Lofstrom, U. Johansson, H. Bostrom, On the use of accuracy and diversity
[38] L.L. Minku, X. Yao, DDD: a new ensemble approach for dealing with concept measures for evaluating and selecting ensembles of classifiers, in: Seventh
drift, IEEE Trans. Knowl. Data Eng. 24 (4) (2012) 619–633. International Conference on Machine Learning and Applications (ICMLA), 2008,
[39] M.D. Felice, X. Yao, Short-term load forecasting with neural network pp. 127–132.
ensembles: a comparative study, IEEE Comput. Intell. Mag. 6 (3) (2011) [73] S. Wang, X. Yao, Relationships between diversity of classification ensembles
47–56. and single-class performance measures, IEEE Trans. Knowl. Data Eng. 25 (1)
[40] Z. Wang, V. Palade, Y. Xu, Neuro-fuzzy ensemble approach for microarray (2013) 206–219.
cancer gene expression data analysis, in: Proceedings 2006 Interna- [74] H. Dutta, Measuring diversity in regression ensembles, in: B. Prasad, P. Lingras,
tional Symposium on Evolving Fuzzy Systems, IEEEXplore, 2006, pp. A. Ram (Eds.), Proceedings of the 4th Indian International Conference on Artifi-
241–246. cial Intelligence (IICAI 2009), Tumkur, Karnataka, India, 16–18 December, 2009,
[41] A. Tsymbal, S. Puuronen, D.W. Patterson, Ensemble feature selection with the pp. 2220–2236.
simple Bayesian classification, Inf. Fusion 4 (2003) 87–100. [75] L.I. Kuncheva, C.J. Whitaker, Ten measures of diversity in classifier ensem-
[42] P.D. Castro, G.P. Coelho, M.F. Caetano, F.J.V. Zuben, in: C. Jacob, et al. (Eds.), bles: limits for two classifiers, in: A DERA/IEE Workshop on Intelligent Sensor
ICARIS 2005, LNCS 3627, 2005, pp. 469–482. Processing, 2001, 10/1–10/10.
[43] Z. Bray, P.O. Kristensson, Using ensembles of decision trees to automate repet- [76] F. Anifowose, A. Abdulraheem, Prediction of porosity and permeability of oil and
itive tasks in web applications, in: Proceeding of the EICS’10, Berlin, Germany, gas reservoirs using hybrid computational intelligence models, in: Proceedings
June 19–23, 2010. SPE North Africa Technical Conference and Exhibition (NATC 2010), Cairo,
[44] M. Heeswijk, Y. Miche, T. Lindh-Knuutila, P.A.J. Hilbers, T. Honkela, E. Oja, A. Egypt, 2010.
Lendasse, Adaptive ensemble models of extreme learning machines for time [77] E. Ip, I. Cadez, P. Smyth, Psychometric methods of latent variable modeling, in:
series prediction, in: C. Alippi, et al. (Eds.), ICANN 2009, Part II, LNCS 5769, N. Ye (Ed.), The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003,
2009, pp. 305–314. p. 238.
[45] C.J. Burges, A tutorial on support vector machines for pattern recognition, Data [78] A. Khoukhi, S. Albukhitan, PVT properties prediction using hybrid genetic
Min. Knowl. Discov. 2 (1998) 121–167. neuro-fuzzy systems, Int. J. Oil Gas Coal Technol. 4 (1) (2011) 47–63.
[46] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995. [79] V. Cherkassky, Y. Ma, Selection of Meta-Parameters for Support Vector Regres-
[47] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and sion, in: Proceedings of the ICANN, 2002, pp. 687–693.
Other Kernel-Based Learning Methods, 1st ed., Cambridge University Press, UK, [80] A.W.F. Edwards, Occam’s bonus, in: A. Zellner, H.A. Keuzenkamp, M. McAleer
2000. (Eds.), Simplicity, Inference and Modeling: Keeping it Sophisticatedly Simple,
[48] J. Taboada, J.M. Matías, C. Ordóñez, P.J. García, Creating a quality map of a slate Cambridge University Press, UK, 2004, pp. 128–132.
deposit using support vector machines, J. Comput. Appl. Maths. 20 (4) (2007) [81] B. Hamming, What explains complexity? in: A. Zellner, H.A. Keuzenkamp, M.
84–94. McAleer (Eds.), Simplicity, Inference and Modeling: Keeping it Sophisticatedly
[49] Y. Xing, X. Wu, Z. Xu, Multiclass least squares auto-correlation wavelet support Simple, Cambridge University Press, UK, 2004, pp. 120–127.
vector machines, Int. J. Innov. Comput. Inf. Control Express Lett. 2 (4) (2008) [82] H.A. Keuzenkamp, M. Mcaleer, A. Zellner, The enigma of simplicity, in: A. Zell-
345–350. ner, H.A. Keuzenkamp, M. McAleer (Eds.), Simplicity, Inference and Modeling:
[50] E. Alpaydin, Introduction to Machine Learning, 2nd ed., The MIT Press, 2010, Keeping it Sophisticatedly Simple, Cambridge University Press, UK, 2004, pp.
pp. 224. 1–10.
[51] V. Kecman, Learning and Soft Computing Support Vector Machines, Neural [83] Least Squares SVM (LS-SVM), Basic Version available online,
Networks, and Fuzzy Logic Models, MIT Press, 2001, pp. 541. http://www.esat.kuleuven.be/sista/lssvmlab/ (accessed 25.12.12).
496 F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

[84] X. Peng, Y. Wang, A normal least squares support vector machine (NLS-SVM) [86] MATLAB CENTRAL, Random Forest, Available at: http://www.mathworks.com/
and its learning algorithm Neurocomputing 72 (2009) 3734–3741. matlabcentral/fileexchange/31036-random-forest (accessed 04.12.12).
[85] Bailer-Jones C.A.L., Statistical Methods, A Computer Course, Available [87] Netlab Toolbox, Neural Computing Research Group, Information Engineer-
from: http://www.mpiahd.mpg.de/ calj/statistical methods ss2011/lectures/ ing, Aston University, Birmingham, United Kingdom, 2012, Available from:
05 regressi on.pdf (accessed 21.12.12). http://www.ncrg.aston.ac.uk/netlab (accessed 11.12.12).

You might also like