You are on page 1of 10

SPE-200949-MS

An Intelligent System for Multi-Label Classification Based on Particle Size

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


and Shape Features using a Cascade Approach

Hossein Izadi, University of Alberta; Morteza Roostaei, RGL Reservoir Management; Mohammad Soroush,
University of Trinidad and Tobago; RGL Reservoir Management; Mohammad Mohammadtabar and Seyed
Abolhassan Hosseini, University of Alberta; RGL Reservoir Management; Mahdi Mahmoudi, RGL Reservoir
Management; Juliana Leung, University of Alberta; Vahidoddin Fattahpour, RGL Reservoir Management

Copyright 2021, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Trinidad and Tobago Section Energy Resources Conference held in virtually, 28-30 June 2021. The paper was
published in a limited release prior to the virtual conference on 15 January 2021.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract
Intelligent systems are becoming more and more popular in the petroleum industry. Particle Size Distribution
(PSD) based on sieve size is a key signature of the unconsolidated/weakly consolidated sandstone
formations and is commonly the main parameter in the sand control design.
With available extensive PSD measurement techniques and a large number of measurements, especially
for horizontal wells, it is necessary to classify the PSDs prior to further analysis for the sand control design.
On the other hand, PSD analysis is not enough for sand control design, and particle shapes need to be taken
into account as well. A successful clustering algorithm for the mentioned purposes needs to be a cascade,
multi-label, unsupervised and self-adaptive approach since the particles can be assigned to more than one
group and there is no prior idea on how many clusters should be formed after the clustering process. Besides,
due to the differences between sieve size and shape features, they should be used separately for clustering
the particles.
In the current study, a cascade approach is used for clustering the particles. In the first level of the cascade,
an unsupervised and self-adaptive algorithm is introduced based on the sieve size features. The algorithm
optimizes the number of clusters through a self-adaptive and incremental approach. The proposed clustering
method uses a minimum similarity threshold (δ) as the only input parameter to start the clustering and tries
to minimize the number of clusters during the clustering. In the second level of the cascade, the similarity
between all particles in each cluster with their corresponding cluster-center is measured, and those particles
that do not respect the δ in terms of the shape similarity, are moved out of the cluster.
The novelty of the proposed method is in three folds. The first one is to provide a particle clustering
algorithm, which works based on the whole range of the sizes and shape descriptors rather than focusing on
certain points in the size graph (D-values). The second one is the dynamic nature of the clustering, which
tends to optimize the number of clusters during the clustering process. The third one is that we have used
a cascade approach for involving both size and shape parameters for the clustering. Our proposed method
can be applied in field application for downhole monitoring and sand screen design.
2 SPE-200949-MS

KeyWords: Cascade Clustering, Unsupervised and Self-adaptive Classification, Particle Size


Distribution, Shape Parameters

Introduction
Particle size and shape are the fundamental characteristics of reservoir formation and have a significant
impact on the hydraulic and mechanical properties of unconsolidated and weakly-consolidated sandstones

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


(Wen et al., 2002; Mahmoudi et al., 2015). Clustering these features to create a representative dataset out
of numerous data obtained in an oil-field is an essential task for the sand control design.
To the best of our knowledge, no studies have been conducted on particle shape clustering for reservoir
characterization. Besides, despite the importance of the particle size for the sand control design and
understanding of the reservoir characterization, few studies have been performed to provide a technique
for particle size clustering. There have been few examples of systematic clustering of the oil sand PSDs
in Western Canada; Carrigy (1966) developed a classification for oil sands. Based on the similarity of
PSDs and grain size, he categorized the PSD curves into three main categories: (1) coarse, (2) fine, and (3)
very fine sands and silts. Recently, Abram and Cain (2014) have developed another classification for the
PSDs of the Devon Pike 1 project. They used a dynamically growing self-organizing hierarchical clustering
algorithm, previously developed by Luo et al. (2004). The final version of the clustering provides four
clusters. Abram and Cain (2014) argued that for the sand control design purposes, it is preferable to have a
limited number of PSD classes. Later Fattahpour et al. (2017) performed another classification based on the
provided data by Carrigy (1966) and two sets of PSD provided by Mahmoudi et al. (2015) from two wells in
McMurray Formation. Fattahpour et al. (2017) only used the similarity between D10, D50, and D70 for their
classification. These D-values have been used extensively in the literature as the key parameters for sand
control design. They finally provided four major and two minor classes. They also reported the comparison
between their results and the classes provided by Carrigy (1966). They concluded that for the studied data
set, selected D-values could be used as reliable and representative of the PSD curve for sand control design
purposes. In another study, Izadi et al. (2019) developed an unsupervised and self-adaptive algorithm for
PSD clustering based on sifter data. Although these works have been carried out through the literature, there
are three main limitations for particle size and shape clustering. The first one is that particle size and shape
features cannot be labeled easily for creating clusters; because, they have two different feature vectors, and
each feature vector contains several parameters (Fig. 1). The second one is that a feature vector can be
assigned to more than one cluster; accordingly, the problem is turned to a multi-labeled classification that
causes another challenge for particle size and shape classification (Law and Ghosh, 2019). The third one is
that anticipating the number of clusters for reservoir particles is almost impractical, and so we cannot use
the conventional supervised clustering systems. To deal with all these limitations, a cascade classification
based on an unsupervised clustering algorithm need to be established to include both size and shape features
and to deal with a number of clusters and multi-label limitation.
SPE-200949-MS 3

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


Figure 1—Main components of Dynamic Image Analysis (DIA) instrument

In this paper, we have developed an intelligent system using a cascade approach. In the first level of
the cascade, the unsupervised clustering algorithm is used for grouping particle size data. Afterward, in the
second level of the cascade, we have used shape parameters to combine or confirm the previously created
clusters. Finally, the result of the clustering is reported along with the representative PSD of each cluster.
The proposed algorithm can significantly deal with the limitations mentioned above. To deal with the first
limitation, the proposed intelligent system is developed as an unsupervised method based on the cascade
approach. Accordingly, it can use both size and shape features for clustering separately and in different
levels of the cascade. To deal with the second limitation, we have used a self-adaptive algorithm that can
assign one feature vector (size and shape) to more than one cluster; therefore, the number of clusters can
be optimized during the clustering. For scientific purposes, we may prefer a larger number of clusters with
each cluster representing more distinct characterizations, while for practical purposes, it makes more sense
to have fewer clusters. As the solution for the third limitation, an online clustering algorithm, which does
not need the number of clusters prior to the clustering is used that can incrementally assign new feature
vectors to the previously created clusters to create a new cluster without repetition of the whole clustering
procedure. The process of clustering scans every particle feature vector once, which improves the efficiency
of the algorithm. This makes the algorithm a great fit for dealing with big databases. The novelty of the
proposed method is in three folds. The first one is to provide a particle clustering algorithm, which works
based on the whole range of sizes and shapes rather than focusing on certain points in the size graph (D-
values) or one shape parameter. The second one is the dynamic nature of the clustering, which tends to
optimize the number of clusters. The third one is that we have used a cascade approach for involving both
size and shape parameters for the clustering. Our proposed method can be applied in field applications for
downhole monitoring and sand screen design.
The rest of this paper is organized as follows. First, the database which is used for this study is presented.
Then, our methodology for developing the proposed intelligent system is described. Our experimental
results are presented afterword, and eventually conclusion remarks are stated and suggestions for future
work are described.

Database
Soil science and sedimentology provide most of the analysis of grain size of geological materials. The
procedure of separating the fine and coarse fraction of samples, known as sieving, is conducted by using a
series of meshed vessels with different sizes. Standard sieve analysis is a popular technique in the industry
4 SPE-200949-MS

because it is fast, economical, and has convenient attributes. The lower measurement range delivered by
sieving technique varies from 3 µm to 125 mm through introducing micro-sieves with high precision. The
accuracy can be further increased by using other separation equipment. The sieving method is the best
option for particle sizes larger than 1 mm.
The image analysis is generally defined as any measures taken to interpret an image. Recently, with the
advancement in computer technologies, image analysis is classified into the quantitative interpretation of

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


digital images. A large number of images can be extensively processed by using image analysis software,
which characterizes the particles with different shapes and sizes. The static and dynamic image analyses can
be carried out through ISO standards (ISO 13322-1; ISO 13322-2). The average grain size can be measured
using an image analysis section in ASTM (ASTM E 1382-97).
An image plane in DIA, shown in Fig. 1, consists of a flowing fluid with suspended particles which can be
tracked by using a high-speed camera with the frequency rate of several images per second. Image analysis
is performed on the compressed raw images and used to characterize the particles with different shapes. The
challenge here is associated with the overlapping of particles which can be resolved by passing the particles
through a narrow opening. To effectively distribute the particles, different dispersing techniques with the
aid of air or fluid can also be used.
In this study, we have collected different feature vectors for our 215 particles. Each particle has 12 sieve
size and 6 shape descriptor curves (Table. 1).

Table 1—Selective parameters used for describing the shape of the particles

Parameter Equation Remarks

Aspect Ratio (1) A descriptor of the particle angularity.

(2) A descriptor of the particle compactness.


Convexity Theoretically, the maximum value is 1 where there are no
A: Projection area bays.
B: Area of concave regions

(3)

Roundness A measure of smoothness of the particle.


ri radius of the inscribed circle at convex corner i
R: Radius of the confining circle
n: number of convex regions

(4)
A measure of the particle shape irregularity with a value
Sphericity between 0 and 1. The smaller the value the more irregular is
P: Perimeter the shape.
A: Area
Preal: Perimeter of the particle projection

A descriptor of the fibers curls with a value between 0 and


Straightness (5)
1. Lower values correspond to more curl in fibers.

Elongation (6) A shape descriptor for fibers.

Methodology
Traditional supervised or unsupervised machine learning algorithms are mostly developed to deal with
single-labeled data, i.e., each data is going to be assigned to only one cluster (Law and Ghosh, 2019).
However, particle shape and size features can be related to more than one cluster. Therefore, to get
the different PSD patterns from the database, an unsupervised clustering algorithm is used in a cascade
approach. Unsupervised clustering is able to cluster data in which any prior labeled learning is not to be
SPE-200949-MS 5

assigned to the data. We have used the size distribution of samples and their corresponding shape descriptors.
Our cascade classification approach clusters the PSDs based on different classifiers using different features
in different steps (Fig. 2). The pseudo code of the proposed method is presented in Algorithm 1. The details
for each step shown in Fig. 2 are provided in the following subsections.

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


Figure 2—Flowchart of the proposed multi-labeled cascade classification

Algorithm 1: Pseudo code of the proposed intelligent system for multi-label classification
Input: Sieve size, DIA shape and size parameters
Start Algorithm:
1.   Set δ ← Minimum similarity threshold
// Pre-processing
2.   Normalized PSD's histogram ← Normalizing the created histograms for PSDs
// Clustering based on the sieve sizes, the first level of the cascade
3.   Feature vectors ← Extracting a 12 sieve size for each particle
4.   For each feature vector ϵ Feature vectors do
5.   Clusters ← Assigning similar sieve sizes to several clusters with respect to δ, and
combine those that have similar PSD(s).
6.   end for
// Clustering based on DIA shape parameters, the second level of the cascade
7.   Feature vectors2 ← Extracting a 12×6 DIA shape parameters for each particle
8.   Clusters2← confirming the cluster(s) ϵ Clusters or putting those particles that do not
respect the δ out of the cluster
9.   return Clusters2
Output: Clusters and their corresponding centers as the representative particles
6 SPE-200949-MS

Preprocessing
In clustering, the data should be normalized to prevent the clustering to be biased on large values and
numbers. Therefore, we have used the following equation (Eq. 1) to linearly normalize our data between
0 and 1 (Rashedi et al., 2009).

(7)

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


where X is normalized histogram value, n is the number of mesh sizes, eps is an arbitrarily small positive
number to prevent infinity values, Vinput is inputted value, Vmin and Vmaxi is minimum and maximum values
i i

for each mesh size in the histogram, respectively. We have normalized all the inputs including sieve size,
and shape parameters.

Feature extraction#1 and the first level of the cascade


In the first level of the cascade, we have used sieve size parameters and as the first level of the cascade, the
incremental clustering algorithm. The similarity threshold for comparing incoming data with the center of
previously created clusters is measured as (Eq. 8).

(8)

where Euclidean_distance = , n is the number of mesh sizes, Xi=[ x1, x2, …, xn] are assigned
particle sizes and Yj=[ y1, y2, …., yp] are center of previously created clusters. The steps of the proposed
clustering algorithm are presented as follows (Sadri et al., 2006).
1. Adjust the minimum similarity threshold (δ).
2. Cluster centers list = ϕ.
3. Read the next input PSD.
4. Find all similar clusters’ centers to the inputted PSD with a similarity greater than δ.
If found: assign the PSD to the (those) cluster(s), update the center of the (those) cluster(s), and
combine those clusters which have at least one common PSD.
If not found: create a new cluster and set the inputted PSD as a new cluster center.
5. Repeat steps in 3-4 for all the input PSDs.
In the first level of the cascade, the algorithm tries to minimize the number of clusters. This is because of
the multi-label nature of the clustering algorithm. As presented in Eq. 3, each sample can be assigned to one
or more clusters, and those clusters that have a similar sample will join together (Sadri et al., 2006). This
process may lead to the fact that some sample(s) do not respect the minimum similarity threshold criteria.

(9)

Feature extraction#2 and the second level of the cascade


All clusters are submitted to the second level of the cascade. In the second level of the cascade, the similarity
of all particles in the cluster(s) with their corresponding cluster-center for DIA shape features is measured
based on Eq. 2. Afterward, if the similarity is greater than the δ, the created cluster in the first level of
the cascade will be confirmed in the second level and reported as one of the cluster results; and if not, the
particles that do not respect the δ are moved out of the cluster.
SPE-200949-MS 7

Results
In this section, the results of our method are presented. We have used a conventional computer, Intel®
Core™ i5-2410M CPU @ 2.30 GHz, and 4 GBytes of RAM and OS Windows 7.
The optimum value of δ (minimum similarity threshold) is 0.9; however, the user can change the δ and
observe the results of the corresponding clustering and optimum aperture size. The optimum value of δ is
set with trial and error and with a fixed increment for assigning the PSDs to the clustering. After setting

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


and fixing δ, the increment of assigning PSDs to the algorithm is randomly chosen. We have also chosen
top_n clusters which cover more than 90% of all PSDs. Observing the running time, which is 75 seconds,
on a standard computer, the proposed method is efficient.
The clustering results for the first level of the cascade is shown in Fig 3. Based on the results shown
in Fig. 3, there are six clusters which their corresponding centers completely cover all PSD ranges with
respect to the initial data.

Figure 3—Clustering results in the first level of the cascade.


8 SPE-200949-MS

In the second level of the cascade, considering the shape descriptors, those particles that similarity
between them and their corresponding cluster-center do not respect the δ are removed. As it is shown in
Table 2, for example in the first cluster, similarities in term of sieve size may respect the δ; however, it does
not respect the δ condition in term of shape parameters. Accordingly, certain PSDs are removed from the
cluster. The clusters after performing the second level of the cascade are shown in Fig. 4.

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021

Figure 4—Clustering results in the second level of the cascade.


SPE-200949-MS 9

Table 2—Similarities in the first and the second level of the cascade.

Similarity Similarity Similarity Similarity


Similarity Similarity Similarity Similarity
in the in the in the in the
in the in the in the in the
Second Second Second Second
# First level # First level # First level # First level
level level level level
of the of the of the of the
of the of the of the of the
cascade cascade cascade cascade
cascade cascade cascade cascade

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


1 0.86 0.94 11 0.90 0.95 21 0.90 0.93 31 0.89 0.95

2 0.51 0.90 12 0.88 0.95 22 0.87 0.94 32 0.89 0.94

3 0.81 0.78 13 0.86 0.87 23 0.86 0.93 33 0.79 0.91

4 0.88 0.91 14 0.89 0.92 24 0.87 0.95 34 0.80 0.91

5 0.88 0.77 15 0.85 0.85 25 0.86 0.93 35 0.87 0.95

6 0.88 0.70 16 0.89 0.96 26 0.84 0.95 36 0.86 0.97

7 0.02 0.76 17 0.89 0.93 27 0.85 0.82 37 0.87 0.96

8 0.02 0.89 18 0.88 0.91 28 0.86 0.90 38 0.01 0.94

9 0.01 0.86 19 0.89 0.95 29 0.88 0.92

10 0.85 0.94 20 0.85 0.90 30 0.86 0.90

Conclusions and Future Work


In this paper, we have developed an intelligent system for particle size and shape features distribution
clustering based on sieve size which is commonly the main parameter in sand control design and shape
parameters measured by DIA method. With a huge number of PSD measurements for horizontal wells, PSDs
should be grouped for practical experiments and analysis. To deal with this challenge, we have developed
a successful clustering algorithm in a cascade manner, and also with multi-label, unsupervised and self-
adaptive approaches. This becomes more important since in real application the PSDs can be assigned to
more than one cluster and there is no prior idea on how many clusters should be formed after the clustering
process.
In the first level of the cascade, an unsupervised and self-adaptive algorithm is introduced based on
sieve size features. The algorithm optimizes the number of clusters through a self-adaptive and incremental
approach. The proposed clustering method uses a minimum similarity threshold (δ) as the only input
parameter to start the clustering and tries to minimize the number of clusters during the clustering. In the
second level of the cascade, the similarity between all particles, in terms of the shape features, in each cluster
with their corresponding cluster-center is measured, and those particles that do not respect the δ in terms
of the shape similarity, are moved out of the cluster.
It is possible to add other features like the depth of each PSD or the morphology to investigate the effect
of the formation and lithology in our clusters.

Acknowledgments
The authors would like to acknowledge the financial support and the permission provided by RGL Reservoir
Management Inc.

References
Abram, M. and Cain, G. 2014. Particle-Size Analysis for the Pike 1 Project, McMurray Formation. J Can Pet Technol 53
(6): 339-354. SPE 173890-PA. https://doi.org/10.2118/173890-PA.
ASTM E1382-97, Standard Test Methods for Determining Average Grain Size Using Semiautomatic and Automatic Image
Analysis. 2015.
Carrigy, M. 1966. Lithology of the Athabasca Oil Sands. Bulletin 18, Edmonton: Research Council of Alberta.
10 SPE-200949-MS

Fattahpour, V., Maciel, V., Mahmoudi, M. et al. 2017. Classification of Alberta Oil Sands Based on Particle Size
Distribution for Sand Control Design and Experimental Applications. Presented at the SPE Canada Heavy Oil
Technical Conference, Calgary, Alberta, Canada, 15-16 February. SPE-185000-MS. https://doi.org/10.2118/185000-
MS.
ISO 13322-1, Particle size analysis — Image analysis methods — Part 1: Static image analysis methods. 2014.
ISO 13322-2, Particle size analysis — Image analysis methods — Part 2: Dynamic image analysis methods. 2006.
Izadi, H., Fattahpour, V., Roostaei, M. et al. 2019. Unsupervised and Self-Adaptive Algorithm for Particle Size Distribution

Downloaded from http://onepetro.org/SPETTCE/proceedings-pdf/20TTCE/3-20TTCE/D031S019R002/2452589/spe-200949-ms.pdf/1 by Turkish Petroleum, aziz mennan on 05 August 2021


Clustering. Presented at the Geoconvention, Calgary, Alberta, Canada, 13-17 May.
Law, A. and Ghosh, A. 2019. Multi-Label Classification Using a Cascade of Stacked Autoencoder and Extreme Learning
Machines. Neurocomputing 358: 222-234. https://doi.org/10.1016/j.neucom.2019.05.051.
Luo, F., Latifur, K., Bastani, F. 2004. A Dynamically Growing Self-Organizing Tree (DGSOT) for Hierarchical Clustering
Gene Expression Profiles. Bioinformatics 20 (16): 2605-2617. https://doi.org/10.1093/bioinformatics/bth292.
Mahmoudi, M., Fattahpour, V., Nouri, A. et al. 2015. Oil Sand Characterization for Standalone Screen Design and Large-
Scale Laboratory Testing for Thermal Operations. Presented at SPE Thermal Well Integrity and Design Symposium,
Banff, Alberta, Canada, 23-25 November. SPE-178470-MS. https://doi.org/10.2118/178470-MS.
Rashedi, E., Nezamabadi-Pour, H., and Saryazdi, S. 2009. GSA: A Gravitational Search Algorithm. Inf sci 179 (13):
2232-2248. https://doi.org/10.1016/j.ins.2009.03.004.
Sadri, J., Ching, Y. S., and Bui, T. D. 2006. A New Clustering Method for Improving Plasticity and Stability in Handwritten
Character Recognition Systems. Presented at the 18th International Conference on Pattern Recognition, Hong Kong,
China, 20-24 Aug. 9209932. https://doi.org/10.1109/ICPR.2006.114.
Wen, B., Aydin, A., and Aydin-Duzgoren, S. 2002. A Comparative Study of Particle Size Analysis by Sieve-Hydrometer
and Laser Diffraction Method. Geotech Test J 25 (4): 434-442. https://doi.org/10.1520/GTJ11289J.

You might also like