You are on page 1of 8

SPE-194573-MS

Usage of Data Science to Predict String Integrity Failures

Rahul Singh and Dipayan Baidya, Reliance Industries Ltd.

Copyright 2019, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Oil and Gas India Conference and Exhibition held in Mumbai, India, 9-11 April 2019.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract
This paper discusses about the various approaches that has been taken to reduce downtime in Coal bed
methane (CBM) wells by predicting the string integrity failure in advance. Various approaches such as the
Principal component analysis (PCA) based T-statistics approach and the Bag of Features approach have
been taken to find a solution. These approaches fall under the classical classification approach of supervised
learning. The power and usefulness of these approaches are fuelled and limited by the number and richness
of annotations and sensors at the Well site. These approaches used on the CBM wells gave very encouraging
results, thereby proving the helpfulness of this approach in enhancing the well efficiency and decreasing
the well downtime by planning based on the failure prediction of the model.

Introduction
Reliance Industries Limited is the operator of a Coal Bed Methane (CBM) Block in Sohagpur coal basin,
SP (West)-CBM-2001/1 located in Shahdol district of Madhya Pradesh state in Central India. Commercial
gas production commenced on 24th March 2017 and Progressive Cavity pumps are used to offload water
from the wells. The operator observed early gas breakthrough in most of the wells which helped to ramp
up the gas production rapidly. The operator though had observed few instances of string integrity failure
leading to work over job. Any such instances leave with it a scope of improvement and the operator hence
wants to plug this gap to improve efficiency.
Coal bed methane gas stays adsorbed on the internal surface of the coal matrix. The CBM gas production
follows a trend which is characterized by a non-monotonic curve of increase and eventual gradual decrease
over an approximate period of 20 years as shown by figure 1. The first stage (1) also known as dewatering
phase which as the name suggests involves the dewatering of the reservoir and hence successive increase
of CBM gas production. The second Stage (2) known as the stable production stage where the production
reaches it peak due to continuous dewatering and achieves a constant gas production rate. The third stage
(3) is the decline phase where the production slowly is in the decline as the driving pressure of the reservoir
slowly decreases.
2 SPE-194573-MS

Figure 1—Production Stages of Coal Bed Methane well

PCP's are positive displacement pumps, which consist of two key parts, the stator and the rotor. The stator
remains stationary and the rotor rotates.
When the rotor is inserted into the stator, it creates an interference fit with the stator elastomer. This
interference fit extends from the pump intake end to the discharge end. As a result, a series of cavities are
created which progress from the pump suction to the pump discharge, with the movement of the rotor inside
the stator. A cavity opens up as another closes, forming a linear chain of fluid movement which is delivered
through the discharge end.
The stator is installed in the well at end of the tubing string, and the rotor is attached to the end of the
sucker rod string. The sucker rod string is driven by an electric motor attached to a direct drive system on
the well surface. This motor provides rotational motion to the rotor allowing it to rotate in the fixed stator.
The sizing of this sucker rod string is dependent on the dewatering rates expected and the targeted water
level depths for reducing the back pressure on coal seams which leads to gas desorption.
In the current configuration at SP West gas wells, a 1" sucker rod string is run in a 2 7/8" tubing string in
the production casing. Continuous centralization is done on the sucker rod string to limit the contact between
the sucker rod couplings and internal surface of tubing's, where the clearance is significantly reduced.
It is vital to note that tubing string failures have occurred at all depths ranging from a few meters to
hundreds of meters in depth. At lower depths, tubing punctures may be explained by some isolated cases of
a bend in polished rod string causing it to come in direct sustained contact with the tubing internal surface.
There is another possibility of tubing age which contributes to failures.
Tubing integrity failures have been encountered in our wells either on account of the tubing being
punctured or even sheared in few cases. The tubing shear cases have the potential to greatly reduce gas
production from the well due to non-retrieval of pumps from the well bore. This affects the new pump intake
depths and can block out crucial gas seams from being exposed by lowering of water levels.
For better monitoring and control of well parameters, SCADA system has been installed at well sites,
which is a big boon for monitoring CBM wells. The pump parameters such as Rotations per Minute (RPM),
torque and current in addition to other data such as tubing head pressure (THP) and well head casing pressure
(WHP) are recorded and monitored in real time through these SCADA systems. Presently, the response
to these failures is of a reactionary nature. We are able to successfully detect failures but it hampers our
rig and material planning activities. In this scenario, there is a need to predict tubing integrity failures to
provide leeway for resource identification and mobilization in an optimized manner. Thus, through the data
science project on tubing integrity failure prediction, we aim to create an application which forewarns us of
impending failures so that planning activities are improved. We aim to develop a predictive method based
SPE-194573-MS 3

on the multivariate analysis of the available well data. This would help in identifying and taking corrective
measures for the possible well failures.

Algorithms
Principal Component Analysis
This feature extraction technique aims at capturing the maximum variance of data using new orthogonal
dimensions called principal components generated from the original data dimesnions. The new dimesnions
are orthogonal to each other and hence are not correlated to each other. Thereby getting rid of the problem of
Multicollinearity entirely. Each transformed observation into the new principal component space (variable
system) is called Score. The PCA model is represented as:

X is the input matrix (n x m)


S is the score matrix (n x p): The observations projected into the new space of orthogonal PCs
L is the loading matrix (m x p): The relation between the old variable and the new variable
 n = number of observations
 m = number of parameters
 p = number of principal components
The new variables or the principal components are always in the descending order of the corresponding
principal componets variance explanation. The first PC will always be the variable which explains the most
variance and so on.

The subspaces spanned by X′ and R are called Score Space and Residual Space respectively. Let si be the
ith column of the Score Matrix S. The following properties hold:-
1. Var(s1)>Var(s2)….Var(sp)
2. Mean(Si) = 0;∀i
3. siT sk = 0; ∀i ≠ k
4. There exists no other orthogonal expansion that captures more variation of the data.
T2 Statistics. It is an indication of the systematic variation in the process measured in the score space. It
directly measures the variation along the loading vectors.

λa contains the first a column and rows of the covariance matrix. The number of new variables used for
T-square statistics should be limited (i.e. a) to only the first few variables since inaccuracies in the smaller
components due to small signal to noise ratio gets exemplified in the calculated T2 statistics.

Q – Statistics.
4 SPE-194573-MS

The portion of the observation space which corresponds to the lower (p-a) principal components can be
measured better using the Q statistics. Here, r is the projection of x into the residual space.

Classification
Classification is a type of supervised Machine learning Algorithm which is used for predicting where the
usecase doesn't involve continuous variables as the target variable. A classification algorithm can be used
in case of a two class problem (ex: Identifying the population into affluent and impoverished class) or multi
class problem (ex: Risk categorisation into High, Medium or Low). There are multiple algorithms to solve
such kind of problems, such as:
– Support Vector Machine
– XGBoost, etc.
The algorithms involve the process of boundary identification which are rules based on which the test or
future data can be classified. Depending upon the data and the usecase, the need for a particular classification
algorithm is determined.

Figure 2—Classification Model

Support Vector Machine. This is a type of supervised classification model that uses the concept of
hyperplane to distinguish one class from the others. The assumption is that points of the same class shall be
clustered together in the feature space on the same side of the separating hyperplane.

Subjected to,
SPE-194573-MS 5

C is a tuning parameter that puts a break on the width of the margin and hence controls bias-variance
tradeoff. When the tunning parameter C is large than the margin is wider leading to more number of points
violating the margin and leading to greater number of support vectors in determining the hyperplane.
M is the width of the the margin
εi are the slack variable that allows individual observations to be on the wrong side of the hyperplane. The
εi variable with a value of 0 for an observation means that the variable is rightly classified by the hyperplane
whereas any positive value less than 1 will mean that the observation has been misclassified by the margin,
and positive value above 1 will indicate that the observation violates the hyperplane.
The classification of test obsevations on the sign of the following equation:-

According to the Figure 3, the star variables which are on the left of the rightmost dashed line has violated
the margin but not the hyperplane but the star variables on the other side of the middle line is said to have
violated the hyperplane.

Figure 3—Support Vector Classification

The linear support vector classifier doesn't work well with observation that cannot be classified with
a linear boundary. To tackle this problem we come to the advent of support vector machines which uses
kernels to extend the feature space. The kernel can be generalized as

Where xi, xi* are two observations


These kernel can be further of different types such as Linear, Polynomial and radial etc. The linear kernel
basically captures the similarity between pairs of observations using Pearson correlation. A polynomial
kernel increases the decision space by allowing the classification in a higher dimension. A radial kernel lets
only training observation near to the test observation have effect on the classification.
XGBOOST. It is a scalable machine learning algorithm for gradient boosting. The speed of this algorithm
is quite good, almost as good as artificial neural networks without using GPUs. The core algorithm is
parallelizable. Boosting algorithm combines many weak models to achieve a final powerful model. Extreme
gradient boosting builds on previous ideas of gradient boosting. Tree ensemble model uses K additive
functions to predict the output.
6 SPE-194573-MS

Here the F is the set of regression trees. fk Maps xi to a certain output. The function is a tree with a definite
number of Leaves (T) and associated score (ωi) for each leaf. A given tree uses associated rules to convert
an input into leaf score. The final prediction is a summation of the leaf score of all the k Trees. The aim is
the minimization of the following regularized objective:

Where

l is a differentiable loss function that measures the difference between the prediction and the actual target
variable. Ω is the regularization parameter that penalizes the complexity of the model to avoid overfitting.
Gradient boosting is an additive model. The new minimization objective:

We add the ft, that improves the model the most. Here is the prediction of the ith instance in the tth
iteration.
The scalability of XGBoost is due to several important systems and algorithmic optimizations. These
innovations include: tree learning algorithm for handling sparse data; a theoretically justified weighted
quantile sketch procedure which enables handling instance weights in approximate tree learning. Parallel -
distributed computing makes learning faster. More importantly, XGBoost exploits out-of-core computation
and enables processing of hundreds of millions of examples on a desktop.

K - Fold Cross Validation


Here the entire data is converted into k folds or subsets. K different validations are done, such as everytime
different (k-1) folds are used for training and the remaining 1 fold is used for testing or validation. The
output of residuals in the validation set for each K different validation folds leading to various test accuracy
metrices that is the average over the K folds. Ideally each fold should be as heterogenous and representive
of the population as possible. This is done so as to make sure that the training sets don't get biased for K
different iterations.
There are two possible goals in cross-validation:-
1. To find the generalizability (or Overfittimg) in the algorithm used for Model building.
2. To compare the performance of two or more different algorithms

Confusion Matrix

Confusion Matrix Actual Positive Actual negative

Predicted Positive True Positive (TP) False Positive (FP)

Predicted Negative False Negative (FN) True Negative (TN)


SPE-194573-MS 7

This matrix is used to derive various metrices for understanding the performance of classification algorithm.
If an observation is correctly predicted positive then it's known as True Positive. If an observation is correctly
predicted negative then it's known as True Negative. If an observation is incorrectly predicted negative then
it's known as False Negative. If an observation is incorrectly predicted positive then it's known as False
Positive. This matrix is further helping to narrow down to various ratios:
1.

2.
3.
4.

Methodology
This part talks about how exactly the algorithms discussed in the Algorithm section have been used to
harness the predictive capability of the model. Both the approaches used as part of this project shall
be discussed in detail. The data required for analysis is fetched in real time from IP21 historian. Data
preprocessing involving Data cleaning (Removing data which is erroneous), Data Imputation (putting in
data where data is missing) and Time synchronization (Tags to be time synced for easier annotation and
model building) has been done prior to Annotating the data. Annotation of the events have been completed,
taking into consideration the domain specific Knowledge, for supervised model building purpose.

Principal Component Analysis Based Statistics Approach


The predictive Multivariate analytics model has used the parameters namely – RPM, Torque, Current,
Tubing Head Pressure and Well head pressure to derive T statistics and Q statistics from the Principal
component analysis output (PCA). The raw data is mean centered before performing PCA. The PCA output
basically is a linear combination of the original parameters into new orthogonal parameters (components)
which are supposed to be in descending order of variance captured by the original parameters. The statistics
calculated (i.e. T statistics and Q statistics) using these new parameters are supposed to provide different
signatures depending on whether the data are for a normal or an abnormal condition.
Given a scenario where we have taken five components of the PCA output:-

A weighted sum of these two statistics will gave us our desired signature. Classification algorithm run
on these stats gave us clear boundaries between the normal and abnormal operating conditions. Given a
new point in time we calculate the statistics and depending on the side of the boundary it falls, it can be
accordingly classified. A subsequent confusion matrix to analyze the model performance is formed using
the actual and predicted outcomes according to the model. Ratios derived from this confusion matrix clears
the picture as to its predictive capability.

Bag of Features with Sliding Window Approach


A bag of Words approach on window of time series data including tags such as RPM, Torque, Current, THP
and WHP. The Window length is chosen based on the use case at hand. A multitude of parameters such as
median, mean, entropy etc. for each of the tags is calculated for the window. Multiple such sliding windows
8 SPE-194573-MS

with 2/3rd overlap are formed. Pre-failure episodes i.e. windows before failure are marked/annotated. After
the annotation process these windows are used for classification model training purpose. Validation of the
model is done by using k-fold cross validation technique. Here the data is divided into k folds and each fold is
subsequently taken as a test set keeping the rest (k-1) folds for training of the model. A subsequent confusion
matrix to analyze the model performance is formed using the actual and predicted outcomes according to
the model. . Ratios derived from this confusion matrix clears the picture as to its predictive capability.

Results and Conclusion


Initial Analysis using PCA on few wells gave encouraging results which correctly predicted 80% of the
points. A weighted sum statistics threshold of around 4.6 gave a clear distinction between the two groups
of normal and abnormal operating conditions for the sample wells on which analysis was performed. This
approach is more useful for a single point in time and doesn't utilize a series of points to derive better
meaning. To overcome this a new approach called bag of features has been used. The Accuracy of bag
of words approach is high in the range of 90% along with a high F – score in the range of ≥0.9 using
classification algorithms such as SVM and XGBoost. The skewness of classes is a problem in case of cross
validation. The results has been tested upon a limited set of wells. The generalization of the models needs
to be tested upon all the wells as part of the project.

References
Abdi, H., Williams, L. J.: Principal Component Analysis. Wiley Interdisciplinary Reviews, 2, 433–459 (2010)
Shao, J: Linear Model Selection by cross – validation. J. Am. Stat. Assoc. 88, 486-494 (1993)
Stone, M: Cross – Validatory choice and assessment of statistical predictions (with discussions) J. R. Stat. Soc., Ser. B
36,111-147(1974)
Rodriguez, J. D., Perez, A., Lozano, J. A.: Sensitivity Analysis of K-fold cross validation in prediction error estimation.
IEEE Computer Society, 32, 569-575 (2010)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning with Applications in R. Springer
Science and Business, New York (2013)
Ting, K. M.: Confusion Matrix. Springer, New York (2017)
Kim, K. I., Jung, K., Park, S. H., kim, J. K.: Support Vector Machine for Texture Classification. IEEE transactions on
Patern Analysis and Machine Integlligence, 24, 1524-1550 (2002)
Gupta, S., Saputelli, L.,Nikolau, M.:Big Data Analytics Workflow to Safeguard ESP Opeartions in Real-
Time,SPE-181224-MS(2016)

You might also like