Early ESP Detection Failure Using AI

EARLY ELECTRICAL SUBMERSIBLE PUMP FAILURE DETECTION USING
ARTIFICIAL INTELLIGENCE DEEP LEARNING LSTM
BACHELOR THESIS
Mohammad Fadillah
12218052
Submitted as partial fulfillment of the requirements for the degree of

BACHELOR OF ENGINEERING
in Petroleum Engineering study program
PETROLEUM ENGINEERING STUDY PROGRAM

FACULTY OF MINING AND PETROLEUM ENGINEERING
INSTITUT TEKNOLOGI BANDUNG
2022
BACHELOR THESIS
Mohammad Fadillah
12218052
Submitted as partial fulfillment of the requirements for the degree of

BACHELOR OF ENGINEERING
in Petroleum Engineering study program
Approved by:
Thesis Adviser,
…………………………
Silvya Dewi Rahmwati, S.Si.,M.Si., Ph.D. Marda Vidrianto, S.T.,M.T.
NIP. 198402222014042001 PHE OSES

Mohammad Fadillah*, Silvya Dewi Rahmawati**, and Marda Vidrianto***
Copyright 2022, Institut Teknologi Bandung
Abstract
Electrical Submersible Pumps(ESP) are one of the industry's most common and cost-effective secondary
techniques. This technology can pump enormous volumes of fluid by decreasing the bottom hole pressure,
allowing oil to flow from the reservoir, and adding energy from the pump and electric motor to bring the fluid to
the surface. However, due to high gas volume, high temperature, and corrosive environments, ESP performance
frequently degrades and reaches the point of service interruption without much warning. Maintenance of ESP is
a highly capital-, resource- and labor-intensive task traditionally accomplished through reactive process
monitoring of multivariate sensor data. The financial impact of an interruption in ESP service is substantial due
to production losses and replacement costs. Consequently, developing technology capable of predicting ESP
failures is crucial for the oil industry.
This case study utilized downhole sensor ESP data in real-time to develop an analytical method for detecting ESP
failures. A slope-shaped classification will be performed on one-hour interval data for three-month (2,675 data)
forecasting performance. Utilizing Long-Short-Term Memory to Forecast Performance. Long Short-Term
Memory can model problems with multiple input parameters downhole sensor of ESP almost seamlessly.
However, LSTM is typically a time series problem in Machine Learning. The crucial difference between time
series and other machine learning problems is that the data samples in time series occur in a sequence. Therefore,
LSTM can classify sequential data, but the prediction failure ESP dataset is not sequential. For this case study,
classification using the Supervised Learning Technique was utilized (Decision Tree).
The downhole sensor ESP forecasting model using LSTM gets an average accuracy of over 90% and less than
10% errors. It makes prediction failure ESP gets for long until 5 days. The algorithms will be developed based on
nine statuses: closed valve, open choke, low PI, higher PI, increase in water, tubing leak, higher PI, increase in
frequency, and sand ingestion, with a 95% accuracy rate. Besides that, to be more accurate with field conditions,
this case study uses the latest matrix troubleshooting, which has 27 trips for failure prediction of ESP. Artificial
Intelligence can be utilized as an effective technology in monitoring ESP systems. A human operator must
constantly monitor these automation and control systems to ensure that all processes usually operate. In addition,
abnormal behavior is identified in advance, enabling operators to quickly determine the most appropriate
corrective action to avoid ESP failure based on the recommendations provided. Moreover, an oil company can
generate billions in revenue from ESP failure prevention measures carried out by engineers.
Keywords: ESP, LSTM, Artificial Intelligence, machine learning, forecasting, early failure prediction
Sari
Salah satu metode pengangkatan yang paling umum dan hemat biaya yang digunakan dalam industri adalah
Electrical Submersible Pump (ESP). ESP dapat memompa volume fluida yang sangat besar, dengan mengurangi
tekanan bottom hole yang memungkinkan minyak mengalir dari reservoir untuk menghasilkan fluida ke
permukaan. Namun, kinerja ESP dapat menurun tanpa adanya peringatan dan mencapai titik gangguan karena
faktor-faktor seperti volume gas yang tinggi, suhu tinggi, dan lingkungan yang korosif. Pemeliharaan ESP
membutuhkan modal yang tinggi, sumber daya, dan tenaga yang secara tradisional dilakukan dengan
pemantauan proses reaktif dari data sensor multivariat. Dampak finansial dari gangguan ESP cukup besar, baik
dari kehilangan produksi maupun biaya penggantian. Oleh karena itu, pengembangan teknologi forecasting
kegagalan ESP adalah salah satu tugas utama industri minyak.
Studi kasus ini menggunakan data real-time untuk membangun metodologi analisis untuk mendeteksi
kegagalan ESP. Klasifikasi akan dilakukan pada data interval satu jam untuk forecasting 3 bulan (2675 data),
yang telah dibentuk menjadi suatu pola. failure prediction menggunakan Long Short-Term Memory (LSTM).
1
LSTM hampir dapat dengan baik memodelkan masalah dengan beberapa parameter input sensor downhole ESP.
LSTM dalam meachine learning merupakan masalah deret waktu. Dibandingkan dengan Meachine Learning
lainnya,Perbedaan penting dalam deret waktu adalah bahwa sampel data yang digunakan secara berurutan.
Kemudian, LSTM dapat digunakan untuk klasifikasi data sekuensial, tetapi dataset ESP failure prediction
bukanlah tipe data sekuensial. Oleh karena itu klasifikas studi kasus menggunakan Supervised Learning; Logistic
Regression, Decision Trees, dan K-Nearest Neighbors.
Model prediction downhole sensor ESP menggunakan LSTM memperoleh akurasi lebih dari 90% dan galat yang
dihasilkan kurang dari 5%. Hal ini membuat prediksi kegagalan ESP berlangsung hingga 3-5 hari. Model
dibangun berdasarkan karakteristik masing-masing parameter yang berbeda dari sembilan status yang terdiri
dari; closed valve, open choke, low PI, higher PI, increase in water, tubing leak, higher PI, increase in frequency,
and sand ingestion dengan tingkat akurasi lebih dari 95%. Selain itu, agar lebih akurat dengan kondisi lapangan,
studi kasus ini menggunakan matriks terbaru yang memiliki 27 trip untuk prediksi kegagalan ESP. Kecerdasan
Buatan dapat dimanfaatkan sebagai teknologi yang efektif dalam monitoring sistem ESP. Sistem otomatisasi dan
kontrol ini memerlukan pengawasan konstan oleh operator untuk memverifikasi semua proses berjalan normal.
Selanjutnya, kondisi abnormal diidentifikasi terlebih dahulu, dan operator dapat secara dini menentukan
tindakan terbaik untuk menghindari kegagalan ESP yang dibangun berdasarkan rekomendasi yang dilampirkan.
Selain itu, perusahaan minyak dapat menghasilkan miliaran pendapatan dari tindakan pencegahan kegagalan
ESP yang dilakukan oleh Engineers.
Kata kunci: ESP, LSTM, kecerdasan buatan, machine learning, predksi kegagalan dini
*) Student of Petroleum Engineering Study Program, Institut Teknologi Bandung, 2018 batch
**) Thesis Adviser in Petroleum Engineering Study Program, Institut Teknologi Bandung
***) Thesis Adviser from Pertamina Hulu Energi Offshore Southeast Sumatra
temperature, and vibration. These data can be used as
1. Introduction
indicators of the existence of specific problems inside
1.1. Background the well, which can be seen from the changes in the
Using artificial lift systems, oil uplift to the surface is a value of the parameters (Santoso, Vidrianto, & Siagian,
critical stage in oil production. Among artificial lift 2020). Integrating the downhole sensor with a SCADA
options, ESPs are often considered efficient and (Supervisory Control and Data Acquisition) system
reliable for pumping high volumes from greater depths raises the level of monitoring. The technology has
and higher temperatures. However, ESPs represent provided a basic level of automated control in the oil
operators' significant CAPEX and OPEX cost items and gas industry for the past decade by continuously
(Matt Cumings, 2013). The Electrical Submersible monitoring well activity (Ratcliff, Gomez, &
Pump (ESP) is an artificial lift system with a high Madogwe, 2013). These real-time surveillance
production rate. New ESP installations are also costly systems, related services, and software helped gather
components of the artificial lift system. The critical numerous historical datasets and ever-increasing real-
feature of ESP, which explains its wide use in time data. However, these datasets have extremely
production, is its high efficiency at great depths in large volumes, are complex and misleading, and are
wells with complex spatial curvature, a wide range of challenging to analyze using simple visualization and
possible fluid flow rates, and ease of operation. dashboarding tools (Saputelli & Gupta, 2016).
The ESP's disadvantages are the equipment's Utilizing big data, a trend over the past decade, is the
complexity, and repair work cannot be done without solution to this issue. Accumulating data from years of
lifting the equipment. Furthermore, in case of any spreadsheet database archiving has made industries
breakdown to continue oil production, the equipment gather valuable patterns and work processes that led to
must be removed and replaced, which is costly in terms success stories (Waskito & Vidrianto, 2019). By
of time and costs. Therefore, one of the critical utilizing this data, industries have advanced in handling
parameters affecting the operational efficiency of oil their problems; avoid ESP shutdowns by moving from
production from ESP is the operating time of a supervised approach toward failure mitigation to a
equipment. It can be generally defined as the time in more practical approach based on early predictive
days from launch to failure required to be lifted to the analysis and prevention (Bayagub, 2021).
surface for replacement or repair. (Shabonas & This paper proposes an automated early predictive
Khabibullin, 2020) analysis built on a real-time dashboard to monitor and
ESP is equipped with a device fitted to the bottom part protect ESP operations by focusing on patterns that
of the ESP structure for monitoring the fluid lifting deviate from normal behavior expectations. Such a
condition. By equipping the sensor with a real-time system can maximize equipment availability, save
surveillance system, such as a lift watcher, the user can millions of dollars in maintenance and lost production,
read real-time data of parameters that can be retrieved and eliminate the need to deploy many field personnel
from the sensor reading, such as average amps, intake and instruments to monitor and investigate ESP
pressure, discharge pressure, intake temperature, motor operations (Gupta, Nikolaou, & Saputelli, 2016).
2
Furthermore, successful and implemented in the their kinetic energy in the diffuser, where kinetic
production process, advances in predicting the energy is converted to pressure energy (Takacs, 2018).
potential for ESP failure will increase the operating Each manufactured pump is assigned a performance
time of ESPs by optimizing their operation mode or curve that quantifies the relationship between
conducting preventive geological and technological horsepower, efficiency, and head about the operating
measures and by minimizing production downtime due flow rate. The catalog performance curve specifies
to on-time logistics preparations for repairing well each pump stage's recommended operating range. The
stock (Shabonas & Khabibullin, 2020). operator can determine when the ESP falls outside of
1.2. Objectives its guided range and then plan for resizing or replacing
The main objectives of this study are: the ESP to accommodate the actual flow rate (Bayagub,
2021).
1. Predict the downhole parameters installed on the 2.1.1. Operation and Real-Time Monitoring
Electric Submersible Pump (ESP) toward the
The ESP system's performance and expected runtime
desired time using time-series analysis LSTM. are highly dependent on the system's proper operation.
2. Estimate the virtual flow measurement for missing Therefore, analyzing these parameters permits an
well-test data values using LSTM and KNN evaluation of the equipment's actual condition and
Regressor. Then, compare LSTM and KNN- provides a method for anticipating potential issues.
Regressor results for the prediction model. Further, an increased level of monitoring comes by
3. Identify and compare the problem occurrence connecting the downhole sensors to a SCADA
pattern in the database of the machine learning (Supervisory Control and Data Acquisition) system
model with the most recent troubleshooting (Ratcliff, Gomez, & Madogwe, 2013). These systems
matrix. transmit the information to a centralized location for
4. Create an algorithm to detect abnormal behavior remote monitoring of ESP and well behavior,
and take preventative action to prevent failure for eliminating the need for dedicated field personnel to
five days. monitor ESP operation continuously and requiring
their intervention only when notified of problems via
alarms generated by SCADA systems (Gupta,
2. Basic Theory
Nikolaou, & Saputelli, 2016).
2.1. Electric Submersible Pump 2.1.2. Surveillance and Failure Modes
Installation and operation of electrical submersible Failures in ESP systems are typically electrical, as the
pumps (ESPs) are straightforward. They are able to electrical system is frequently the weakest link.
extract extremely large volumes of oil from highly However, the actual cause of these failures is a
productive reservoirs. Crooked or deviated holes are different, generally mechanical, problem. For this
not problematic. ESPs can be utilized in offshore reason, each failure's root cause must be precisely
operations. In general, lifting costs for large quantities identified and analyzed. Consequently, failure analysis
are quite low. ESP applications are limited by the need is a valuable tool for extending the life of ESP
for high-voltage electricity, inapplicability to multiple equipment (Takacs, 2018).
completions, unsuitability for deep and high-
Once the operator determines that the downhole
temperature oil reservoirs, difficult gas and solids
equipment has "failed," a failure mode can be selected.
production, and expensive installation and
Typically, it results from an abnormal operating
maintenance. ESP systems have greater horsepower,
condition identified by the operator via surface
operate in hotter environments, are used in dual
instruments, a monitoring/control system, or a well
installations and as spare down-hole units, and include
test. Typically, a pull is necessary to repair or replace
oil/water separation down-hole. Sand and gas issues
downhole equipment. Table 1 lists several potential
have prompted the development of new products.
installation failure modes for ESP (Bayagub, 2021).
System automation includes monitoring, analysis, and
control (Guo, Lyons, & Ghalambor, 2007). 2.1.3. Downhole Sensor
Figure 1 depicts the submersible pump system's The downhole sensor is positioned beneath the motor,
transformers, motor controllers, junction box, making it the most subterranean component of ESP.
wellhead, and downhole components: motor, seal, Average amps, intake pressure, discharge pressure,
pump, and cable. Vertically operating centrifugal intake temperature, motor temperature, and vibration
pumps with multiple stages are used for ESP are the parameters that the sensor can monitor. The data
installations' submersible pumps. Although their is transmitted to the surface via the power cable and is
constructional and operational characteristics displayed on the motor controller. In addition, using a
underwent continuous evolution, their fundamental real-time surveillance system, the user can access data
operating principle remained the same. After being from the supervisory control and data acquisition
subjected to large centrifugal forces caused by the high system (Santoso, Vidrianto, & Siagian, 2020).
rotational speed of the impeller, produced liquids lose
3
2.1.4. Hagedorn and Brown Correlation 2.3.1. Long Short-Term Memory (LSTM)
As fluid flows from the reservoir to the surface, it loses The LSTM cell makes long-term memory more
pressure due to differences in altitude and friction. The performant because it allows more parameters to be
model of lift performance can therefore be used for learned. Therefore, LSTM makes it the most powerful
well diagnosis and optimization. A Vertical Lift (Recurrent Neural Network) to do forecasting, especially
Performance (VLP) also includes a pressure and rate when you have a longer-term trend in your data.
curve that describes the well's lift performance. Therefore, LSTMs are one of the state-of-the-art models
Calculating pressure drop and production tubing for for forecasting at the moment. (Korstanje, 2021).
different rates and wellhead conditions yield the LSTM networks are a type of recurrent neural network
relationship. Several correlations were used to (RNN) created to address situations where RNNs failed.
calculate the pressure drop. Hagedorn and Brown are RNNs operate on current inputs while considering
one of the correlations commonly used for a vertical previous outputs (feedback) and retaining them in
well. (Hidayat, 2021) memory for a brief time (short-term memory). To begin
The heart of the Hagedorn-Brown method is a with, it cannot retain data for an extended period. To
correlation for the liquid holdup; the modifications of predict the current output, relying on information stored
the original method include using the no-slip holdup long ago is frequently necessary. On the other hand,
and Griffith correlation for the bubble regime RNNs cannot handle such "long-term dependencies."
(Economide, Hill, & Ehlig-Economides, 1993). (Hochreiter & Schmidhuber, 1997). Figure 3 shows
2.2. Machine Learning repeating models in LSTM. (Hyndman & B. Koehler,
Machine learning is an artificial intelligence (AI) 2006)
application that enables systems to learn and improve 2.3.1.1. Sigmoid
their models over time without reprogramming Sigmoid activations are present in gates. A sigmoid
automatically. The first step in the learning process is activation is performed in a manner comparable to the
to observe data to search for patterns to make future tanh activation. It compresses values between 0 and 1
decisions based on the data provided. Learning a rather than between -1 and 1. This is useful for updating
predictive model that maps specific inputs to the or erasing data because any number multiplied by 0 is
desired output is supervised learning. Several 0, causing values to disappear or be "forgotten." Any
supervised machine learning algorithms are utilized in number multiplied by 1 has the same value; therefore,
this study. The selection of an algorithm is typically not this value is preserved or "kept." The network can
straightforward and frequently requires trial and error, determine which data is unimportant, which data can
as each algorithm has its strengths and weaknesses. be discarded, and which data must be retained.
Table 2 demonstrates some pros and cons of these 2.3.1.2. Forget gate
algorithms (Geron, 2018). There is a gate to forget. This gate decides which data
should be kept and which should be discarded. The
2.3. Deep Learning in Recurrent Neural Network
sigmoid function transmits data from the previous
Deep learning is part of machine learning used to hidden state and the current input. The values range
model a high level of abstraction on data based on from 0 to 1. The closer to 0 indicates forgetfulness,
algorithms using implementation layers and complex whereas the closer to 1 indicates retention. Forget layer
structures or vice versa, consisting of several non- as shown in Figure 4.
linear transformations (Schmidhuber, 2014). In
addition, deep learning can extract patterns obtained 2.3.1.3. Input gate
from data that help the model distinguish classes. This The input gate is employed to alter the state of the cell.
feature also plays a role in achieving good prediction The previous hidden state and the current input are
results, called Feature Engineering. The development provided to a sigmoid function. This determines which
of deep learning helps solve big data problems such as values will be updated by transforming them to the 0 to
Computer Vision, Speech recognition, and Natural 1 range. 0 represents irrelevance, while 1 represents
Language Processing. significance. Moreover, pass the hidden state and
Time series are analyzed to comprehend the past and current input to the tanh function to help regulate the
predict the future, enabling managers or policymakers network by squeezing values between -1 and 1 into the
to make well-informed decisions (Nielsen, 2010). range 0 to 1. The outputs of tanh and sigmoid are then
RNNs are a type of neural network architecture multiplied. The sigmoid outcome will determine which
primarily employed to detect patterns in sequential data, information from the tanh output is sufficient to retain.
such as language or, in our case, numerical time series. Input layer as shown in Figure 5.
Information is passed through the network with cycles, 2.3.1.4. Cell gate
i.e., the data is transmitted back into the model to Initial multiplication of the forget vector by the cell
identify sequential patterns in the data (Hochreiter & state is performed pointwise. Cell gate can cause the
Schmidhuber, 1997). Figure 2 shows RNN Cell. cell state to decrease if it is multiplied by values close
to zero. Then, take the output of the input gate and
perform a pointwise addition, thereby updating the cell
4
state with new values deemed relevant by the neural There are three categories of grouping on the R square
network. Cell gate determines our new cellular value, namely the strong category, the moderate
condition. The cell layer, as shown in Figure 6. category, and the weak category. The R square value of
2.3.1.5. Output gate 0.75 belongs to the strong category, the R square value
Note that the hidden state contains previous input of 0.50 belongs to the moderate category, and the R
information. Additionally, predictions are based on the square value of 0.25 belongs to the weak category. R
concealed state. The sigmoid function initially squared can not only be used on regression but can also
provides the previous hidden state and the current use the R squared formula in all models to determine
input. The altered state of the cell is then whether or not a model is good.
communicated to the tanh function. Information
regarding the hidden state is determined by multiplying 3. Methodology
the tanh output by the sigmoid output. The outcome is This study is conducted based on the workflow as
the concealed condition. The new cell state and hidden follows:
are then carried forward to the next time step. Output
1. Data Acquisition
gate, as shown in Figure 7.
It retrieves data from the cloud system to serve as the
2.3.2. Accuracy for Forecasting
simulation database.
It is essential to evaluate forecast accuracy using
genuine forecasts. Consequently, the size of the 2. Forecasting model development, the selected
residuals is not a reliable indication of how extensive model for predicting sensor parameters is a time
and accurate forecast errors are likely to be. The series with the steps outlined below.:
accuracy of predictions can only be determined by • Filtering data within the desired time interval and
considering how well a model performs on new data resampling the dataset to hourly intervals.
not used when fitting the model. The second criterion • Creating two types of datasets from the prepared
is usually considered to classify forecasting accuracy data: training and testing data sets.
measures in the literature. (Hyndman & B. Koehler,
2006) identify forecasting results by measurements • Selecting a model that fits the training dataset and
based on percentage errors. (Chicco, Warrens, & evaluating the model's performance on the testing
Jurman) R-squared is a more informative metric than dataset.
MAPE for estimating percentage errors. • Forecasting the future by re-fitting the model to
2.3.2.1. Mean Absolute Percentage Error the entire data set.
For errors to be independent of scale, it is customary to Parameters of Downhole Sensors Forecasting As a
express error measures as percentages.: form of justification to further analyse, forecasting
𝑛 results will be concurrently compared with the original
1 𝑌𝑖 − 𝑋𝑖 sensor data.
𝑀𝐴𝑃𝐸 = ∑| |
𝑚 𝑌𝑖 3. Classification Model Construction
𝑖=1
Table 3 demonstrates how to interpret MAPE during Several wells' processed data will be classified utilizing
model evaluation. machine learning, which will then be followed by:
Another performance metric for regression models is Calculate the flow rate value for every dataset using the
the MAPE, and its interpretation of relative error is model supplied by the related company.
very intuitive. Due to its definition is recommended for • Compare the result calculating the flow rate
use in situations where sensitivity to close variations is between LSTM and KNN Regressor. Use the
more important than sensitivity to unlimited variations. most negligible error result compared to the two
(Myttenaere, Golden, & Rossi, 2016). However, it also methods.
has some disadvantages. The most significant ones are
limiting its use to strictly positive data by definition and • Analyst a machine learning model by comparing
being biased toward low forecasts, making it unsuitable the outcomes of multiple methods.
for predictive models where substantial errors are • Compare the result machine learning model with
anticipated. (Armstrong, 1986). the Update troubleshoot matrix failure.
2.3.2.2. R Squared 4. Failure Prediction
The coefficient of determination can assume values in The machine learning model generates a
the interval [-, 1] based on the relationship between the prediction based on the predicted sensor data as
ground truth and the prediction model. Following is a the new input based on five-day windows.
summary of the significant cases we've encountered
5. Prescription of Preventive Action Bringing out
(Chicco, Warrens, & Jurman).
the proactively suggest corrective action to
∑𝑚
𝑖=1(𝑋𝑖 − 𝑌𝑖 )
2
address the failure is developed.
𝑅2 = 1 − 𝑚
∑𝑖=1(𝑌𝑚𝑒𝑎𝑛 − 𝑌𝑖 )2 The detailed workflow of this study is further described
in Figure 8.
5
4. Case Study the performed well tests, the flow rate will be estimated
using the model supplied by the associated company.
Several significant ESP operation-related parameters
were utilized as model input variables. These included The flow rate reconstruction requires static, profile, and
well inflow parameters like fluid pressure and dynamic data, which can be continuous or time-series.
temperature at the pump intake, tubing pressure, flow "Static" implies that there is no change in the data for
rate, and tubing diameter; pump performance an extended period of time in the life of an ESP well,
parameters like discharge pressure, pump setting depth, barring special circumstances. In addition, it includes
stages, and pump type; and motor diagnostic information regarding pump installation and well
parameters like vibration and motor temperature. completion. While the report of dynamic data is natural
(Faradila) because it reflects the behavior of the well and pump
operation, which consists of:
Two different data records were tagged for the study.
These are: • Surface Pressure Gauges Data: Casing pressure
and tubing pressure.
1. Data record containing time-series information of
various parameters on downhole gauges. The data • Downhole Gauge Data: Pump discharge pressure,
was recorded at a one-hour interval for one well. pump intake pressure, ampere reading, and intake
temperature
2. Data records contain information on when a trip or
failure occurred in that well. The pre-processing of 121,462 raw is shown in Table
5. Data began with scaling all numerical features into
Those data come from seventeen wells equipped with
the range 0 to 1, followed by hyperparameter tuning to
lift watchers provided by four types: RC1000, RC2500,
determine the optimal model parameters
SN2600, and SN3600.
The LSTM model can perform regression in
This record was utilized to analyze the behavior of
determining virtual flow rates.121,462 pre-processing
patterns based on the selected parameters obtained
data that has been scaling is first analyzed with EDA
from the historian before, during, and immediately
(Exploratory Data Analysis) before being used in the
following the trip or failure event. (Gupta, Nikolaou, &
LSTM model for regression. EDA can help detect
Saputelli, 2016).
errors, identify outliers in data sets and understand
4.1. Downhole Sensor Parameters Forecasting relationships between data so that the raw data of the
This step aims to predict the downhole sensor EDA results becomes 6,071 data attached to Table 6.
parameters, including motor temperature, current In this case study, the virtual rate regression compared
consumption, motor vibrations, temperature, and the results obtained from LSTM and KNN-Regressor.
pressure for the intake and discharge sections of the Use the KNN Regressor result from the last model
pump for the next five days. (Bayagub, 2021). The method with the smallest error
Downhole sensor forecasting uses a multivariate time result is used in the classification model.
series model of LSTM (Long-Short-Term-Memory) The k-nearest neighbor, a chosen model to regress the
analysis. Forecasting using LSTM is the situation in the virtual flow rates, is one of the algorithms that the
field where the downhole sensor data is interconnected learning is based on "how similar" data from others and
in predicting a downhole sensor parameter. Therefore, goes together with a large amount of data. (Faradila).
LSTM is affected by a large amount of data available.
4.3. Early ESP Failure Prediction
The historical data used in LSTM forecasting is
At this stage, the well-data-integration engine will
obtained from February 2019 to August 2020.
apply statistical techniques to all predictor values,
However, the large amount of data can cause noise in
reducing noise and identifying patterns and
the forecasting model because a data vacuum is not in
correlations between the variables.
order with the time of the data history. Therefore, data
resampling was done for one hour from April 11, 2020, The algorithm enables the development of an agreed-
02.10 PM to July 31, 2020, 00.00 PM. 2675 Data is the upon, mathematically-verified pattern for all of the
result from resampled shown in Table 4. Then, the data prevalent ESP failures. For example, suppose the
will be divided 80:20 into training and test datasets to current well data indicate a trend that matches the
learn the relationship between independent variables pattern of a previous failure. In that case, the algorithm
and the target variable in terms of a mathematical generates an alarm for the specific issue and sends it to
function or set of rules. The result discharge pressure the field technicians at any time of day.
of time interval and dataset can be seen in Figure 9. For more be actual, it is equal to field condition in this
The forecasting results will be integrated with the case study use the update matrix troubleshooting, there
original dataset appropriately. are has 27 troubles shooting as shown in Table 7.
4.2. Flow Rate Reconstruction The technician can then highlight any underlying
Flow rate is a factor that must be accounted for when causes that may have hindered ESP performance and
predicting ESP failure based on the readings of provide simultaneous evaluations for the remedial
downhole sensor parameters. Due to the limitations of course of action.
6
5. Result and Discussion on Discharge Pressure. The loss model visualizes the
difference in errors between test data and training data
5.1 Forecasting Model Development
where there are fewer errors between test data and
5.1.1 Define and Fit Model LSTM training data.
The process begins with splitting the data at 80:20. The Furthermore, an accurate forecasting calculation is
training data was obtained after splitting as many as carried out by calculating MAPE and R Squared. The
2141 data from 16044. The abundance of data and MAPE and R Squared results are selected as follows in
layers influences the LSTM model. The tuning done on Table 8. Each MAPE and R Squared result is different
the LSTM is a hidden layer, learning rate, look_back, between one downhole sensor and another sensor
num_epochs, and batch_size. The number of initially downhole. However, the yield of MAPE on each sensor
hidden layers in the default LSTM template is 128 and downhole is no more than 10%, whereas the LSTM
is usually changed with the 2n formula such as model is built on "highly accurate forecasting". Then
32,64,512, 1024, etc. However, this model uses a the results of R Squared were nothing negative and
hidden layer of 256, as shown in Figure 10. close to 1, so the LSTM model showed a good
Learning rate is one of the training parameters to correlation between the six downhole sensors in
calculate the weight correction value during the forecasting.
training process. This learning rate value ranges from 5.1.2 Evaluate and Entire Dataset
zero (0) to (1). The greater the learning rate, the faster
The LSTM model that has been built is the best result
the training process will run, and the network accuracy
obtained by tuning hidden layers, epoch, learning rate,
will decrease. The learning rate used in the LSTM
and batch_size. This study case initially wanted to
model is ADAM (A Method of Stochastic
predict the failure in August 2020. However, in field
Optimization). ADAM can achieve excellent and fast
conditions at that time, there was no failure. Therefore,
results because ADAM uses the first and second-
the number of data sets initially up to August 1, 2020,
moment gradient estimates to adapt the learning rate
was changed to July 1, 2020. So, in this case, the LSTM
for each neural network weight.
model that will be made will predict two times, namely
Epochs represent the number of iterations that must be July and August.
performed on a dataset. Epochs signify a learning cycle
The application of the LSTM model to the six
of deep learning algorithms from the entire training
downhole ESP sensors that are fortified is the same, so
dataset. One epoch means a deep learning algorithm is
the discussion will focus on the Discharge Pressure
known from the training dataset. The number of
sensor. The resulting dataset of 10282 was originally
Epochs depends on the number of datasets. But the
16044 data produced. The MAPE value on each model
number of epochs used in this model is 50.
is less than 10, but the R-Squared value is not close to
Batch size is the number of sample data deployed to the 1. The result of the dataset change is in Figure 13. Then
Neural Network. For example, suppose we have 100 the next step is to change the hidden layer, which was
datasets, and our batch size is five. In that case, this initially 256 to 128 and 512. The result of the hidden
algorithm will use the first five slips of data (1st, 2nd, layer change is in Table 9. The table shows that the
3rd, 4th, and 5th) and then deploy or train by the Neural value of the R Score is still minus this number of
Network until completion, then retrieve the second 5 hidden layers and does not significantly affect the
data samples from 100 data (6th, 7th, 8th, 9th, and the LSTM model. The best tuning results were obtained to
10th), and so on until 5 data samples to 20 (1005=20). determine the month of July, namely data on April 20,
This LSTM uses one batch size. 2020, 14.10 to July 1, 2020, 00.00 as many as 1715
Early stopping in the LSTM model prevents overfitting datasets with 256 hidden layers. The LSTM model in
by stopping the model training. The patience argument July can be seen in Figure 14 on July and obtained the
is another keyword that sets in the early stopping MAPE and R scores in Table 10.
callback. The default setting for patience is zero. The The results shown in Table 11 of the LSTM model in
training is terminated if the performance metric August with July obtained the model with the best
declines from one epoch to the next. It may not be results, namely the LSTM model with data until August
optimal given that the model's performance is noisy because the amount of data used was more. Therefore,
and fluctuates from epoch to epoch. The trend, when predicting the month of July using the model up
however, should be improving. to the data of August. Fit model of the six downhole
For this reason, I frequently set the patience to several sensors in Figure 15.
epochs, such as five epochs in this case. In this case, 5.2 Downhole Sensor Forecasting
the training will end if there is no improvement in the A fit LSTM model is used in forecasting downhole
monitor performance measure over the previous five sensors in predicting five days with a data interval of 1
epochs. The parameter on the LSTM model used in hour. The downhole sensor validation is compared to
forecasting modeling is shown in the following Figure the original data, which predicts August 1, 2020, to
11. August 9, 2020. Each sensor in the TABLE had an
Furthermore, validating the selected LSTM model can error of less than 10%. The amount of time is longer
be visualized in Figure 12 from the resulting loss in e than five days in terms of predictions and comparing
7
actual data because that LSTM can be long and very Table 15 and Table 16 show the result of rate
suitable for forecasting, as evidenced by Table 12. reconstruction for each parameter value of the
Downhole sensor parameters are interrelated in downhole gauge. The predicted values were then
forecasts. The prediction results can be seen in Figure compared to the nearby flow rate derived from well test
16. data, which had a value of 278.66 for predict in August
Then in July, it is predicted that starting from June 28, and 291.92 for predict in July, which is very similar.
2020, 00.00 to July 3, 2020, 14.00 can correctly predict 5.3.3 Pressure Drop Calculation: Hagedorn-brown
the next five days. However, it differs from august Correlation
because there is a limit in the data used to determine The recommendation from the paper is to calculate the
the error in the prediction results. The error results pressure drop using correlation. The correlation chosen
obtained are as follows in Table 12. is Hagedorn-brown for this case study. Pressure drops,
5.3 Re-Construction Flow Rate Model tubing diameter, and pump installation are the essential
The LSTM model used in regression is almost the same components for rate calculation. Due to the limitations
as forecasting. The data is divided into 12 input of the conducted well test, however, the tubing pressure
variables: casing pressure, tubing pressure high, PBHP, will be estimated by converting the discharge pressure
PBHT, ESP max, pump type, STG, HP, Volt, AMP, from each sensor reading using the formula below.:
PSD, tubing id, and BFPD data as output to be Hagedorn brown Correlation
regressed. 𝑢2
5.3.1 EDA (Exploratory Data Analysis) 𝑑𝑝 𝑓𝑚2 ∆( 2𝑔𝑚 )
𝑐
Clustering K-Means are used in EDA by using the 144 =𝑝+ +𝑝
𝑑𝑧 (7.413 𝑥 1010 𝐷5 )𝑝 ∆𝑧
orange application. The K-means clustering algorithm
groups data based on the distance between the data
against the centroid point of the cluster obtained 𝑇𝑝 = 𝐷𝑝 − ∆𝑃
through an iterative process. The initial stage is The data and assumptions used in calculating pressure
determined by the outlier ranking using the SVM one drop with Hagedorn-brown correlation are in.
class method with a nu parameter of 85% with a kernel However, casing pressure will be handled differently
of 0.01. by utilizing the same value as the nearby well test data.
5.3.2 Comparing LSTM Model and KNN- This is due to the fact that the value of casing pressure
Regressor varies only slightly from well test to well test, ranging
The LSTM model used in the regressor is 256 hidden from 5 to 10 psi.
layers, epoch 50, with a validation split of 0.2, There are differences in casing pressure data used in
obtaining a MAPE value of 4.32. MAPE results and the prediction model in July and August, as well as
model validation loss can be seen in Figure 17 and pressure drop values because of differences in GOR
Figure 18. In addition, the blind data test was values.
conducted to validate the model's accuracy by Casing pressure for flowrate regression of the July
predicting additional well test data for 43 different model uses the assumption of well-test casing pressure
wells, with an average error of 12.7%, as shown in on May 25, 2020, which is 50.02 psia, and the pressure
Table 13. KNN Regressor resulted up to 99.59% drop value of the Hagedorn-brown result is 3235.29
accuracy, which possibly overfits. The blind data test psia
was conducted to validate the model accuracy by
predicting another well test data with 43 various wells The casing pressure for the regression of the august
resulting in a 5% error on the average, as pointed out in model flowrate uses the assumption of well-test casing
Table 14 (Bayagub, 2021). pressure on July 18, 2020, which is 60.02 psia, and the
pressure drop value of the Hagedorn-brown result is
KNN-Regressor gets an error from blind test less than 3311.4 psia. The approach took so that the results are
more LSTM because the dataset isn't sequence data. by the conditions of the field.
While in LSTM for sequential data. So raw data change
to be a sequential data format. So LSTM depends on 5.4. Classification Model Development
the dataset, but in LSTM, do EDA first. So it can 5.4.1. Machine Learning Method
reduce a dataset from 121462 to 6071 data. A typical issue at this stage is that the prepared data
LSTM networks require significantly more training data consists of multiple units for every real-time parameter
than KNN to achieve adequate accuracy, and LSTM of the well. Table 18 depicts the percentage change
requires more hyperparameter tuning than KNN. between the current and preceding elements, which
Therefore, the best model used in flowrate regression converted the data into a slope. In addition, each data
is the KNN-Regressor The error percentage can still be set must be labeled per the company's specifications,
accepted. The model will be applied to the data detailed in e. The pattern recognition of the parameters
gathered in the preceding section to predict the virtual that led to these issues is described in
rate. Table 20.
8
The dataset is split into training and test datasets around 6. Conclusion
a 75:25 ratio sampled across all the categories.
1. LSTM is used to forecast the downhole
Machine learning could be chosen over the accuracy of
parameters installed on the Electric Submersible
several results of the oversampled data. For example,
Pump (ESP) five days in advance with an
the model accuracy using logistics regression, random
acceptable percentage error and R Score, which is
forest, Support Vector Machine (SVM), K Nearest
also classified as a highly accurate forecast and
Neighbor (KNN), and decision tree are 86.16%,
eligible for further processing analysis.
99.96%, 98.88%, 99.8%, and 99.85% (Bayagub,
2021). 2. The virtual flow rates are reconstructed using an
5.4.2. Update Matrix Troubleshooting LSTM based on "how similar" data from the
provided well test data with getting error average
The matrix update is considered a compliment and
of 12.6% (result from the blind test), which is
validation of the completed machine learning model's
bigger than the error of the KNN-Regressor's 5%.
results. However, in the results generated by the latest
troubleshooting updating, a simple model is used as 3. There are 27 types of recorded problems based on
"conditional". This is due to the limitations of the updated matrix troubleshooting. It is shown in
LSTM model itself and requires further studies to Table 7. Updated matrix troubleshooting is more
create a model from the troubleshooting matrix update. informative and detailed about the
troubleshooting.
5.4.3 Failure Prediction
The inputs result from downhole sensor parameters The resulting failure of ESP five days ago is Low
forecasting with a constructed virtual rate under the Productivity Index (Decision Tree) and Low
same format as the prepared data. Within a 5-days time, Cooling Motor (updated matrix troubleshooting).
those five machine learnings will run predictive It is the same with the actual condition is High-
failures. For more detail, compare with the updated Temperature Motor.
troubleshooting matrix to make result predictions 4. The combination of multivariate forecasting
various and near-real. The prediction results were then LSTM and machine learning with a Decision Tree
compared to the related company's historical data to is chosen as the surveillance system for detecting
meet the validation. abnormal data and generating the predictive
The machine learning model used in prediction is the results compatible with the company's historical
Decision Tree gets accuracy until 99.85%. Failure data. In addition, the surveillance system is also
prediction data is from August 1, 2020, 00.00 to August equipped with the prescription of preventive
5, 2020. The results obtained from Decision Tree are action provided by the related company.
that ESP does not experience troubleshooting. The
updated troubleshooting matrix's prediction results 7. Recommendation
showed the same thing: there was no problem with 1. The data is limited since not all wells have a real-
ESP. The prediction results are shown on the sensor in time monitoring system on the ESP downhole
Figure 19 and Table 21. gauge. Adding a real-time surveillance system to
The results of the failure prediction were carried out at other wells, if possible, would increase the
different times, namely in July. The data selected in amount of data collected.
July begins on June 28, 2020, at 04:00, and ends on 2. If sensor readings are obtained at constant time
July 3, 2020. The failure prediction results indicated intervals, the time filtering and down sampling
trip 1, or Low PI with a Decision Tree model. The processes are no longer necessary. It is believed
results of the updated troubleshooting matrix indicated that these expedite the database creation
that the ESP had a low cooling motor. The prediction procedure.
results are shown on the sensor chart in Figure 20 and
3. The API RP 11S1 and ISO 14224 standards make
Table 22; the validation table of the prediction results
early ESP failure prediction more accurate. A
with company history data indicates that ESP in July
failure mode describes an ESP system failure,
experienced a High-Temperature Motor on July 3,
failed item(s), failure descriptor(s), and failure
2020.
cause, which is illustrated in Figure 21.
5.4.4. Prescription of Preventive Action
4. The value of horsepower and voltage shouldn't
The notification system's logic is based on pattern
rely on installation data and be estimated using
recognition for event detections and prediagnostic
new approaches since these are one of the most
applications. These notifications allow for fast and
considerable parameters for flow rate
automated operational corrections to maintain optimal
reconstruction.
pump operation. For example, a typical correction
might include suggesting a pressure test, contacting the 5. Using orange applications in EDA in identifying
Field Service Tech, or providing a detailed explanation outliers isn't accurate. EDA should use python.
in Table 23. Table 24 illustrates the output of the 6. Created a dataset for the updated matrix
surveillance system, including the prescription of troubleshooting so that it can be used in
preventive action for predicting a specific failure. supervised learning
9
8. Acknowledgment PSD = Pressure Setting Depth
The author would like to thank Allah SWT and His
blessings for allowing this study to be conducted 10. References
accurately. Alhanati, F., S.C, S., & T.A, Z. (2001). ESP Failures:
Can We Talk the Same Language? SPE
The acknowledgments are also extended to those who
148333, 11.
have been involved, supportive, and guiding
throughout the author's studies at Bandung Institute of Armstrong, J. S. (1986). Long-Range Forecasting
Technology and the completion of this bachelor's From Crystal Ball to Computer.
thesis, including:
International Forecasting Journal, 4.
1. Beloved family for the ceaseless support,
motivation, and prayers during the whole study. Bayagub, F. H. (2021). Early Electric Submersible
2. Mrs. Silvya Dewi Rahmawati, S.Si., M.Si., Ph.D. Pump Failure Detection Using Artificial
as the first thesis adviser, for the guidance and Intelligence. Bachelor Thesis in Petroleum
motivation to my interest in artificial lift course Engineering ITB, 26.
until thesis finishing.
Chicco, D., Warrens, M., & Jurman, G. (n.d.). The
3. Mr. Marda Vidrianto, Mr. Andang Suharji, and the
coefficient of determination R-squared is
Artificial Lift team on Pertamina Hulu Energi
Offshore Southeast Sumatra for providing the data more informative than SMAPE, MAE,
and the guidance during thesis' study. MAPE, MSE and RMSE in regression
analysis evaluation.
4. Mrs. Faradilla Bayagub and Mr. Fahmi Hidayat
spared time to facilitate discussions and mentoring Economide, M. J., Hill, A., & Ehlig-Economides, C.
out of their busy schedules. (1993). Petroleum Production Sysyems. New
5. Fadhil Rafi was sparing time for discussions and Jersey: Prentice Hall PTR.
mentoring about python out of his busy schedule.
6. Al Rizki Dwi Lanang and Faiz Alfarisi as the Geron, A. (2018). Hands-On Machine Learning with
author's thesis partner in Pertamina Hulu Energi Scikit-Learn and TensorFlow. United States
Offshore Southeast Sumatra, to brainstorm about of America: O'Reilly Media.
machine learning study.
Guo, B., Lyons, W., & Ghalambor, A. (2007).
7. Farah Saphira's encouraging words and thoughtful
Petroleum Production Engineering: A
yet detailed feedback have been very important to
Computer-Assisted Approach. Louisiana:
me.
Elsevier Science & Technology Books.
8. Gading Arganu, Mahmuda Nurhadi, Fajril Afkaar,
Widi Rossita for the support during the study. Gupta, S., Nikolaou, M., & Saputelli, L. (2016). ESP
9. All lecturers and staffs within Bandung Institute of Health Monitoring KPI: A Real-Time
Technology Petroleum Engineering Department, Predictive Analytics Application. SPE-
for the knowledge and support throughout author's 181009-MS, 10.
undergraduate study.
Hidayat, F. (2021). New Production Optimization
9. Nomenclature Workflow: Coupling the Well Modeling and
Field Optimization. Bachelor Thesis in
𝐷𝑝 = 𝐷𝑖𝑠𝑐ℎ𝑎𝑟𝑔𝑒 𝑃𝑟𝑒𝑠𝑠𝑢𝑟𝑒 𝑑𝑜𝑤𝑛ℎ𝑜𝑙𝑒 𝑠𝑒𝑛𝑠𝑜𝑟
Petroleum Engineering, 22.
𝑓 = 𝑓𝑟𝑖𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟
𝑙𝑏𝑚 Hochreiter, S., & Schmidhuber, J. (1997). LONG
𝑚 = 𝑡𝑜𝑡𝑎𝑙 𝑚𝑎𝑠𝑠 𝑓𝑙𝑜𝑤 𝑟𝑎𝑡𝑒 ( )
𝑑 SHORT-TERM MEMORY. Neural
𝑝 = 𝑖𝑛 𝑠𝑖𝑡𝑢 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 Computation 9(8):1735-1780, 1997, 32.
𝑓𝑡
𝑢𝑚 = 𝑚𝑖𝑥𝑡𝑢𝑟𝑒 𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦 ( ) Hyndman, R. J., & B. Koehler, A. (2006). Another
𝑠𝑒𝑐
∆𝑃 = Result from Hagedorn brown calculation. look at measures of forecast accuracy.
HP = Horse Power International Journal of Forecasting 22, 10.
STG = Stages Korstanje, J. (2021). Advanced Forecasting with
PBHP = Producing Bottom Hole Pressure Python With State-of-the-Art-Models
PBHT = Producing Bottom Hole Temperature Including LSTMSs, Facebook's Prophet, and
AMP = Ampere Amazon's Deep AR. Maison Alfort: Apress.
LSTM = Long Short-Term Memory Matt Cumings, M. A. (2013). ESP Surveillance and
KNN = K-Nearest Neighbour Optimization Solutions: Ensuring Best
10
Performance and Optimum Value. SPE
164382, 14.
Myttenaere, A. D., Golden, B., & Rossi, F. (2016).

Mean Absolute Percentage Error for
regression models. Neurocomputing, 11.
Nielsen, F. (2010). Introductory time series with R.

Journal Of Applied Statistics, 12.
Ratcliff, D. E., Gomez, C., & Madogwe, O. (2013).

Maximizing Oil Production and Increasing
ESP Run Life in a Brownfield Using Real-
Time ESP Monitoring and Optimization
Software: Rockies Field Case Study. SPE
166386, 11.
Santoso, H. B., Vidrianto, M., & Siagian, U. W.

(2020). Predicting Failure in Electric
Submersible Pump by Utilizing Machine
Learning Based on Real-Time Sensor Data.
IATMI Professional Technical Paper, 23.
Saputelli, L., & Gupta, S. (2016). Big Data Analytics

Workflow to Safeguard ESP Operations in
Real-Time. SPE-181224-MS, 14.
Schmidhuber, J. (2014). Deep learning in neural

network: An Overview. Elsevier Neural
Network, 33.
Shabonas, A. R., & Khabibullin, R. A. (2020).

Prediction of ESPs Failure Using ML at
Western Siberia Oilfields with Large
Number of Wells. SPE-2011881-MS, 16.
Takacs, G. (2018). Electrical Submersible Pumps

Manual. Cambridge, United State: Gulf
Professional Publishing.
Waskito, L. B., & Vidrianto, M. (2019). Initiate

Digital Oil Field Application at Mature
Offshore Oil Field in South East Sumatera,
Indonesia. SPE-196398-MS, 6.
11
List of Figures
Figure 1 Main Part of An ESP (Alhanati, S.C, & T.A, 2001)
Figure 2 RNN Cell
12
Figure 3 LSTM Cells.
Figure 4 Illustrate Forget Gate
Figure 5 Illustratae Input Gate
13
Figure 6 Illustrate Cell Gate
Figure 7 Illustrate Output Gate
Figure 8 Methodology
14
Figure 9 Selected Time Interval and Model LSTM Forecasting of Discharge Pressure
Figure 10 Hidden Layer of LSTM Model
Figure 11 Result Tuning of LSTM
15
16
Figure 12 Model Loss LSTM
(a.Discharge Pressure, b. Motor Temperature, c. Average Ampere, d. Intake Temperature, e. Intake Pressure, f. Vibration)
Figure 13 Result Model LSTM, data changed
17
Figure 14 Model LSTM and Model July on July
18
19
Figure 15 Visualization Model LSTM Fit Result Forecasting Downhole Sensor
20
21
Figure 16 Result Forecasting LSTM in 5 days
Figure 17 Model LSTM For Regression
22
Figure 18 Model Loss LSTM
Figure 19 Graph Result of Forecasting in August
Figure 20 Graph Result of Forecasting in July
23
Figure 21 ISO 14244 and API RP11S1 Standard
List of Tables
Table 1 ESP Installation Failure
Category Cause Characteristics
Pump off due to low liquid rate, downtime, and low pump
Insufficient well inflow efficiency.
Low pump efficiency due to high gas-liquid ratio because

high casing pressure pushes casing gas through the pump,
Poor Reservoir Free gas interference or pump setting depth is too shallow.
Performance
Throated tubing by waxing causes a low production rate.

Waxing or emulsification Production resumes typically after dewaxing by heating
or physical removal.
Reverse-rotations of impellers disable the ESP. This is

Reverse-rotation of caused by insufficient pump capacity to remove all fluids
ESP flowing from reservoir to well.
Downhole Pump throughput drops due to leakage in the draining

Downhole leakage valve, tubing coupling, or pressure gauge.
Equipment Failures
The oil rate drops due to the ESP shaft break. Abrasions
Pump mechanical failure of impellers induce high slippage and low pump
efficiency.
Surface Facility Waxing in flowline, small choke flow area, and choke
Issue Surface flow blockage plugged by produced sand or waxing.
Table 2 Supervised Algorithm
Algorithm Strengths Weaknesses
1. Robust to noise.
Logistic 1. Requires data preparation.
2. Good interpretability because the output
Regressions 2. Handles only linear decision boundaries.
is the probability.
1. Requires data preparation.
Support 1. Works well in the complicated domain 2. Kernel selection can be tricky.
Vector
Machine 2. Works well with outliers. 3. Poor performance and long computation time
if the data set is large and noisy.
1. No effort for data preparation. 1. Lack of interpretability if tree quantity is

Random
2. Able to rank feature importance. large.
Forest
3. Works well in high-dimensional spaces. 2. May overfit if data is noisy.
24
1. Very easy to implement for a multiclass
problem.
K-Nearest 1. Doesn't perform well on imbalanced data.
2. Constantly evolves by responding
Neighbor 2. Very sensitive to outliers.
quickly to changes in the input during
real-time use.
1. Not require any normalization of data. 1. Inadequate for applying regression and
Decision
Tree 2. Missing values do not affect the process predicting continuous values.
of building a model. 2. Involves higher time to train the model.
Table 3 MAPE Value for Forecasting Power
MAPE Value (%) Forecasting Power
< 10 Highly Accurate Forecasting
10 – 20 Good Forecasting
20 – 50 Reasonable Forecasting
> 50 Weak and Inaccurate Forecasting
Table 4 Raw Data for LSTM Model
Intake Intake Motor

Average Discharge Vibration
Well Reading Time Pressure Temperature Temperature
Ampere (A) Pressure (psi) (Gravit)
(psi) (psi) (oF)
FHB-03 2020-04-11 14:00:00 41.520000 1326.23200 4545.39000 296.404000 338.894000 0.742000
FHB-03 2020-04-11 15:00:00 37.215152 1206.21651 4647.44454 294.87424 333.012273 0.796515
FHB-03 2020-04-11 16:00:00 33.876667 1124.3133 4511.421667 292.32667 327.60333 0.700000
FHB-03 … … … … … … …
FHB-03 2020-08-01 00:00:00 25.7000000 471.10000 3795.90000 283.8000 318.0000 0.800000
Table 5 Data Pre-Processing for Regression Virtual Rate
25
Table 6 Result Data Processing After EDA
Table 7 Updated Troubleshooting Failure Matrix
Average Intake Discharge Intake Motor

Well Condition Vibration Rate Status
Ampere Pressure Pressure Temperature Temperature
Shaft Broken D I I I I I D 1
String Leaking I I I I I - D 2
Intake Plugged D I I I I - D 3
Perforation Plugged D D D - I - D 4
Water Cut Increase I I I - C - D 5
Start up with killing

fluid I C C - I - I 6
Shut-in at surface D I I - I - D 7
Pump stage plugged I I I - I - D 8
Reservoir pressure
I I I - C - I 9
increase
Free gas increase at
intake I D I - I - D 10
Pump stage wear D I D - I I D 11
Frequency Increase I D I - I - I 12
Open choke C D D - C - I 13
Closed SCSSV D I I - I - D 14
Lower influx D D D - I - D 15
Higher influx I I I - C - I 16
26
Reverse rotation D I D - I - D 17
Sand Ingestion I D C - I I D 18
Gas ingestion C D C - I - D 19
Scale build-up on
pump I D D I I - D 20
Shroud plugged
above the sensor D I D I I - D 21
Shroud plugged
D D D I I - D 22
below the sensor
Low Cooling motor I/D D D I I - D 23
High CP D I D I I - D 24
Pump housing leak

near the head I I D I I - D 25
Pump housing leak

at middle to bottom D I D I I - D 26
Choke Plugging D I I I I - D 27
Table 8 Forecasting Accuracy LSTM
Downhole Sensor MAPE R Squared
Discharge Pressure 2.8428 0.99
Motor Temperature 0.9358 0.92
Average Ampere 1.34 0.95
Intake Temperature 0.5163 0.90
Intake Pressure 2.92 0.98
Vibration 8.074 0.94
27
Table 9 Sensitivity Layer for LSTM in July
Hidden Layer Model LSTM and R Square
256
128
512
28
Table 10 MAPE and R Squared for July
Downhole Sensor MAPE R Squared
Discharge Pressure 1.6531 -.1.237
Intake Temperature 2.2416 -0.93
Vibration 8.967 -3.21
Table 11 Compare R Square Model LSTM August and July
Downhole Sensor August July
Discharge Pressure 0.99 -1.237
Intake Temperature 0.1 0.930
Vibration 0.84 -3.21
Table 12 Result error from LSTM Model compared with actual data
Downhole Sensor August July
Discharge Pressure
Motor Temperature
Average Ampere
Intake Temperature
Intake Pressure
29
Vibration
Table 13 Error Blind Test LSTM Model
Well BFPD Predicted BFPD Error
NIS-01 747 619.2 13.4
IIS-08 1067 964.64 9.6
AGN-C2S 277 289.9 4.69
REL-03 171 189.3 10.6
Table 14 Error Blind Test KNN-Regressor
Well BFPD Predicted BFPD Error
NIS-08 5260 4985.707 5.21
IIS-08 1067 1051.69 1.43
AGN-C3 928 924.525 0.37
REL-11 5076 4852.45 4.4
Table 15 Result Of Rate Reconstruction for August

Well Reading Time Ampere Pressure Pressure Temperature Temperature Vibration Rate
FHB-03 2020-08-01 01.00.00 27.2391 461.054 3824.071 285.3533 316.235 0.8331 276.6364
FHB-03 2020-08-01 02.00.00 27.26844 461.3316 3824.088 285.2121 315.8248 0.81655 276.6166
FHB-03 2020-08-01 03.00.00 27.11796 462.29 3824.683 285.4109 315.8114 0.8508 276.551
FHB-03 2020-08-01 04.00.00 27.595 461.1531 3822.451 285.3742 316.2411 0.840618 276.729
…… …… …… …… …… …… …… …… ……
FHB-03 2020-08-06 01.00.00 28.1986 461.0296 3819.598 285.5276 316.664 0.930413 277.2766
Table 16 Result Of Rate Reconstruction for July

Well Reading Time Ampere Pressure Pressure Temperature Temperature Vibration Rate
FHB-03 2020-06-28 01.00.00 26.88197 493.4565 3747..977051 287.2691 318.453678 0.542396 292.5868
FHB-03 2020-06-28 02.00.00 26.40584 494.0146 3747.997314 289.0301 321.768209 0.511477 292.7283
30
FHB-03 2020-06-28 03.00.00 26.52379 496.79 3748.0518 288.21 319.661499 0.544018 293.0066
FHB-03 2020-06-28 04.00.00 27.04052 493.3766 3747.967773 287.7553 318.9258 0.5079 292.9812
…… …… …… …… …… …… …… …… ……
FHB-03 2020-07-03 03.00.00 25.45657 503.7595 3748.264893 288.4318 318.6367798 0.467114538 292.7832
Table 17 Assumption Data for Hagedorn-brown Correlation
Data Field Unit
Depth 10,109 Ft
Tubing Iner Diameter 2.992 In
o
Oil Gravity 30 API
Oil Viscosity 5 Cp
Production GLR 20 Scf/bbl
Gas Specific Gravity 0.7 Water = 1
Flowing Tubing Head Pressure 200 Psia
Flowing tubing head o

80 F
Temperature
The flowing temperature at the o

180 F
tubing shoe
Liquid Production Rate 282.106 Stb/day
Water Cut 42 %
Interfacial Tension 30 Dynes/cm
Specific Gravity of Water 1.05 𝐻2𝑂 = 1
Table 18 Result of Slope Calculation in August

Vibration Rate
0.02934 0.2776 0.017 -0.1412 -0.4102 -0.01655 -0.0198

-0.15048 0.9584 0.595 0.1988 -0.0134 0.03425 -0.0656
31
Table 19 Sample of Processing Data (Machine Learning)

Vibration Rate Trip
-0.0538 -0.0017 0.0133 0.0001 0.0016 0.2017 -0.0944 9
0.0546 0.0645 0.0145 0.0009 -0.0015 0.2017 0.0431 4
-0.0038 -0.0193 -0.003 -0.0013 -0.0008 -0.0268 -0.0026 0
0.0034 -0.0304 0.0085 0.0001 0.0006 -0.0609 0.0083 6
0.006415 0.000111 0.002451 0.000631 0.000986 0.046957 0.002384 0
-0.0297 -0.0018 -0.0153 0.0005 -0.0004 0.3061 -0.0023 1
Table 20 Pattern Matrix for Classification with Machine Learning

Well Condition Ampere Pressure Pressure Temperature Temperature Vibration Rate Status
Low PI I/D D D - - - D 1
Pump Wear D I D - - I D 2
Tubing Leak C I D I I - D 3
Higher PI I/D I I - - - I 4
Increase in
I D I - I - I 5
Frequency
Open Choke I D D - C - D 6
Increase in
I I I - C - D 7
Watercut
Sand Ingestion I D I/D - I/D I/D D 8
Closed Valve
D I I I I - D 9
(SSSV)
32
Table 21 Result of Failure Prediction in August
Table 22 Result of Failure Prediction in July
Table 23 Notifications for Engineers to Preventive
Table 24 Prescribed Preventive measures.
No. Well Conditions Possible Cause Prescriptive Message

- Confirm by a pressure test at the
tubing wellhead.
1. Tubing Leak
- Conduct dead head test or fill up the
tubing and pressure up against RCV.
- Verify if the valve was deliberately
Closed Valve partially closed by Field Service Tech.
2.
(SSSV) - Contact the Field Service Tech to
check out well on location.
- Analyze the fluid level and Bottom

Hole Pressure data.
Well productivity
Low Productivity - Adjust the tubing well head pressure
3. less than pump
Index (PI) and
design range
bring the pump production rate within
design rate.
33
- Pumping fluids through tubing when
water sources are available. If
restricted by high viscous oil, solvent
Restricted pump or higher gravity fluid should be
injected down well annulus to dilute.
- Use VSD in 'rocking mode' to
remove debris.
- Analyze the fluid level and Bottom

Well productivity Hole Pressure data.
greater than pump - Adjust the tubing wellhead pressure
design range and bring the pump production rate
Higher Productivity within the design rate.
4.
Index (PI) - Adjust the tubing wellhead pressure
and bring the pump production rate
Change in fluid within the design rate.
characteristics
- Conduct the fluid analysis as a basis
for redesigning the pump.
Check the pump discharge pressure
5. Open Choke and well production rate compared to
previous well data history.
- Check flow line and separator for sand,
mud, or debris evidence.
6. Sand Ingestion
- Design a solid control system for the
next installation.
- Check that the system performance
has decreased (kWh/BPD/1000 ft of
lift) over 15-20% from the installation
date.
7. Pump Wear
- Verify if the vibration has increased by
20% from the pump install date.
- Shut-in test while the surface check
valve is closed and the pump runs.
- Check the pump discharge pressure
and compare it to previous well data
Increase in history.
8.
Frequency
- Lower the value of frequency using
VSD.
Adjust the tubing wellhead pressure
Increase in Water
9. and bring the pump production rate
Cut
within the design rate.
34

Early ESP Detection Failure Using AI

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Early ESP Detection Failure Using AI

Uploaded by

Copyright:

Available Formats

EARLY ELECTRICAL SUBMERSIBLE PUMP FAILURE DETECTION USING

ARTIFICIAL INTELLIGENCE DEEP LEARNING LSTM

Submitted as partial fulfillment of the requirements for the degree of

PETROLEUM ENGINEERING STUDY PROGRAM

Submitted as partial fulfillment of the requirements for the degree of

Silvya Dewi Rahmwati, S.Si.,M.Si., Ph.D. Marda Vidrianto, S.T.,M.T.

NIP. 198402222014042001 PHE OSES

Mohammad Fadillah*, Silvya Dewi Rahmawati**, and Marda Vidrianto***

Copyright 2022, Institut Teknologi Bandung

Myttenaere, A. D., Golden, B., & Rossi, F. (2016).

Nielsen, F. (2010). Introductory time series with R.

Ratcliff, D. E., Gomez, C., & Madogwe, O. (2013).

Santoso, H. B., Vidrianto, M., & Siagian, U. W.

Saputelli, L., & Gupta, S. (2016). Big Data Analytics

Schmidhuber, J. (2014). Deep learning in neural

Shabonas, A. R., & Khabibullin, R. A. (2020).

Takacs, G. (2018). Electrical Submersible Pumps

Waskito, L. B., & Vidrianto, M. (2019). Initiate

Figure 1 Main Part of An ESP (Alhanati, S.C, & T.A, 2001)

Figure 2 RNN Cell

Figure 4 Illustrate Forget Gate

Figure 5 Illustratae Input Gate

Figure 7 Illustrate Output Gate

Figure 10 Hidden Layer of LSTM Model

Figure 11 Result Tuning of LSTM

Figure 13 Result Model LSTM, data changed

Figure 17 Model LSTM For Regression

Figure 19 Graph Result of Forecasting in August

Figure 20 Graph Result of Forecasting in July

Category Cause Characteristics

Low pump efficiency due to high gas-liquid ratio because

Throated tubing by waxing causes a low production rate.

Reverse-rotations of impellers disable the ESP. This is

Downhole Pump throughput drops due to leakage in the draining

Table 2 Supervised Algorithm

Algorithm Strengths Weaknesses

1. No effort for data preparation. 1. Lack of interpretability if tree quantity is

Table 3 MAPE Value for Forecasting Power

MAPE Value (%) Forecasting Power

< 10 Highly Accurate Forecasting

Table 4 Raw Data for LSTM Model

Intake Intake Motor

FHB-03 2020-04-11 16:00:00 33.876667 1124.3133 4511.421667 292.32667 327.60333 0.700000

FHB-03 2020-08-01 00:00:00 25.7000000 471.10000 3795.90000 283.8000 318.0000 0.800000

Table 5 Data Pre-Processing for Regression Virtual Rate

Table 7 Updated Troubleshooting Failure Matrix

Average Intake Discharge Intake Motor

Water Cut Increase I I I - C - D 5

Start up with killing

Pump stage plugged I I I - I - D 8

Pump stage wear D I D - I I D 11

Low Cooling motor I/D D D I I - D 23

Pump housing leak

Pump housing leak

Table 8 Forecasting Accuracy LSTM

Downhole Sensor MAPE R Squared

Discharge Pressure 2.8428 0.99

Motor Temperature 0.9358 0.92

Average Ampere 1.34 0.95

Intake Temperature 0.5163 0.90

Intake Pressure 2.92 0.98

Vibration 8.074 0.94

Hidden Layer Model LSTM and R Square

Downhole Sensor MAPE R Squared

Mohammad Fadillah, Silvya Dewi Rahmawati, and Marda Vidrianto