You are on page 1of 7

URTeC: 2669988 1

URTeC: 2669988
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Predicting ESP Lifespan With Machine Learning


Jessamyn Sneed*, Devon Energy
Copyright 2017, Unconventional Resources Technology Conference (URTeC) DOI 10.15530/urtec-2017-2669988
This paper was prepared for presentation at the Unconventional Resources Technology Conference held in Austin, Texas, USA, 24-26 July 2017.
The URTeC Technical Program Committee accepted this presentation on the basis of information contained in an abstract submitted by the author(s). The contents of this paper
have not been reviewed by URTeC and URTeC does not warrant the accuracy, reliability, or timeliness of any information herein. All information is the responsibility of, and, is
subject to corrections by the author(s). Any person or entity that relies on any information obtained from this paper does so at their own risk. The information herein does not
necessarily reflect any position of URTeC. Any reproduction, distribution, or storage of any part of this paper without the written consent of URTeC is prohibited.

Abstract

Though representing only a small percentage of the artificial lift systems in their fleet, electric submersible pump
(ESP) repairs are an incredibly expensive issue at Devon Energy. In an effort to better understand ESP behavior and
potentially delay these failures in the future, the Advanced Analytics team and the Production Operations team
collaborated to statistically identify the key drivers behind ESP failures and determine if it was possible to
accurately predict an ESP’s lifespan using predictive model techniques. Continuous time series data from PI was
summarized over ESP lifetime and combined with static descriptive data from Wellview in fifty-three ESPs across
the Delaware Basin. Data exploration was performed in SAS Enterprise Guide before a number of predictive models
were created in SAS Enterprise Miner.

Model competition was performed using a variety of modeling types such as linear regression, decision trees, and
high performance random forests (HP Forest). The best model, the HP Forest model, was selected based on average
square error. The HP Forest model predicted ESP lifespans which were, on average, within approximately five days
of the true ESP lifespan. 90% of the model’s predictive error were within +/- 30 days of the true ESP lifespan. The
top three variables of importance when predicting ESP lifespan were metrics related to ESP shutdowns. Other
notable variables included those related to proppant size and amount. These results prove that it is possible to create
an appreciably accurate statistical model to predict ESP lifetime using static summarized data. After further
standardization and optimization, this model may be operationalized in the future. This modeling process may also
serve as the basis for future modeling exercises using unsummarized continuous time series data. Key driver
analysis highlighted the influence that ESP shutdowns have on the lifespan on an ESP. Since ESPs can be shut down
as a response to mechanical or human interference, analysis of ESP shutdowns using pattern recognition analysis
and chaos theory may be performed in the future.

Introduction

The recent international downturn in the oil and gas industry following the boom in US unconventional reservoir
production has created a tense atmosphere in which efficiency is paramount. A pressing need has emerged
throughout the industry for streamlined processes and improved operations efficacy. In pursuit of this goal, many
energy companies have turned to analytics and big data to improve decision making and better utilize existing data
processes. Significant investments technology and talent development have been made, resulting in enhanced data
quality, new data management standards, and the development of advanced analytical capabilities. Data scientists
are being hired en masse to apply their skills in data mining, machine learning, and big data applications to the
energy sector. The key to data scientists’ success in the industry lies not only in their technical skill, but also in their
ability to communicate and collaborate with industry subject matter experts (SMEs). Whether a reservoir
technologist, drilling engineer or HR professional, the wealth of invaluable knowledge provided by a SME gives
context to data and allows for more meaningful statistical analysis. The development of a partnership between a data
scientist and SME, in which both parties are heavily invested and willing to communicate their work to a wider
audience, increases the likelihood that their efforts will create sustainable changes in the business.

URTeC 2017 Page 863


URTeC: 2669988 2

Recent decreases in drilling and completions activity have caused the industry’s focus to shift towards improving
operational efficiency in upstream production. Oil and gas companies have begun to utilize analytics in the
production arena, allowing data to influence key operational decisions such as artificial lift method selection and
artificial lift performance optimization. Artificial lift systems are used in dead and dying wells to boost formation
pressure and stimulate production [1]. One common artificial lift method is electric submersible pumping, in which
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

a multi-stage centrifugal pump powered by an electric motor is submerged up to 10,000 feet below surface.
Advantages of using an electric submersible pump (ESP) include small surface footprint and volume capacity higher
than any other artificial lift method. However, ESPs also have significant limitations such as a poor handling of gas
and solids and limited adaptability to major changes in reservoir [2,3]. Furthermore, ESPs can have significant
associated costs such as the establishment of a reliable power supply, costly repairs which can rarely be performed
in the field, and high pulling costs. Despite these detractors, ESPs are commonly used in both onshore and offshore
operations across the United States and abroad. An improved understanding of the various factors contributing to
ESP lifespan could lead to significant cost savings by minimizing repair and pulling costs as well as lost production.

The main objective of any analytical process is to tell the story written in the data. This is primarily achieved
through data mining, a multistep technique which carefully examines, modifies, and models historical data.
Predictive modeling is a type of supervised machine learning in which a predictive model is built using labeled data
with known outcomes and then used to infer outcomes from new data. Examples of this statistical modeling process
include linear regression, decision trees, random forests, and neural nets [4]. These varied analyses are performed
using a SAS analytical software suite. From examining harmonic patterns in electrical supplies [5] to utilizing real-
time ESP monitoring [6] to failure pattern analysis [7], numerous oil and gas professionals have attempted to
identify contributors to ESP failure and predict ESP lifespan in recent years. A select few of these professionals have
made this attempt using predictive modeling. Guo et al. built a support vector machine model to predict ESP failures
using electrical and frequency data [8]. Gupta et al. provided an analytical framework for real-time ESP predictive
modeling [9]. The project presented in this paper approached this problem in a slightly different manner, using a
combination of historical static data and summarized time series data to create a predictive model to forecast ESP
lifespans as well as identify their key drivers.

Figure 1: Labelled ESP Diagram [2]

URTeC 2017 Page 864


URTeC: 2669988 3

Methods

Traditionally, production teams have relied on business intelligence and descriptive statistics to implement
reactionary measures to mitigate operational problems. However, this business practice can hinder the agility of an
operational team and does not easily allow for the adoption of new technological advances. In 2013, Devon Energy
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

created the Operations Excellence (OE) team whose aim was to drive continuous improvement and deliver
sustainable, repeatable results. The members of this team were well-known thought leaders with a diverse range of
operations expertise and a shared interest in unique operations technologies and techniques. Last year, the Advanced
Analytics (AA) team met with the OE team to brainstorm collaboration opportunities that could deliver significant
operational and financial impacts. This meeting resulted in the development of a proof-of-concept exercise to
examine ESP failures. A substantial portion of the production operations budget went to repairing and replacing
ESPs therefore any gains in ESP lifespan could potentially lead to significant cost savings. The goal of this project
was to identify the factors that influence ESP lifespan and create a predictive model to predict ESP lifespan using
several variables. To reach this goal, the AA team would use the SEMMA project flow developed by the SAS
Institute. An acronym for the data mining process, SEMMA stands for the following: Sample the data, Explore the
data, Modify the data, Model the data, and Assess the model [10]. Having agreed upon the direction of analysis, the
OE team then created a panel of OE production engineers specializing in ESPs to serve as SMEs for the project.

Figure 2: SEMMA diagram (https://sisbinus.blogspot.com/2014/11/processes-in-data-mining.html)

To begin, the SME panel selected a geographic area rich in ESPs, a common target formation, and a specific
timeframe of interest. Employing these guidelines, the AA team created a sample dataset of 51 ESP failures across
thirty seven wells which occurred between January 2015 and July 2016. Next, the SMEs identified all the potential
factors that affect ESP function. The resulting list included static metrics from across the wellbore lifespan such as
geological characteristics and frac design, as well as continuous time series metrics such as casing pressure and
motor temperature. The SME panel also identified static performance metrics to calculate such as average cycle
duration and total number of cycles. With the potential variables identified, the AA team collaborated with the
corporate Data Management team as well as the IT group to locate the relevant static data across multiple SQL
databases and receive guidance as to efficient architecture navigation and data integration. To access and assess the
ESP time series data, the AA team worked extensively with the corporate Automation team. This team is tasked
with the 24/7 collection and storage of time series data from the field using a data streaming infrastructure. With
their help, the AA team was able to obtain ESP data interpolated across one-minute intervals. Descriptive statistics
were performed on each variable and the normality of continuous numeric variable was assessed. Additionally, all
continuous numeric variables underwent correlation analysis and chi-squared tests were performed on categorical
variables. A data scientist from the AA team then met with the OE SMEs to discuss the data exploration findings,
identify outliers in the data, and determine how to handle missing data.

URTeC 2017 Page 865


URTeC: 2669988 4
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Figure 3: Data Exploration Result Example. Created with SAS Enterprise Guide.

The challenges encountered during this data exploration phase primarily stemmed from data quality and reliability
issues. Over time, as with any large-scale enterprise, data standards constantly change to improve data quality. This
means that a significant amount of manual data manipulation must be done by the data scientist when using
historical data so that the archival data meets the standards of today. Though time consuming, this data modification
is a crucial step in the predictive modeling process as excessive outliers and inaccurate data can skew the modeling
results. These data challenges can be used to highlight data improvement opportunities and catalyze the creation of
new QA/QC measures by internal data governance groups. Another challenge faced in this exercise was the
utilization of time series data. While time series data can be used to show trends and forecast events, it cannot be
used with static data in a predictive model. For this exercise, the AA and OE teams chose to take the lifetime
averages of the time series variables. Future iterations of this project may include a predictive model using only time
series data and compare its accuracy to the static data model.

Predictive Modeling and Results

Predictive modeling is a supervised learning technique in which historical data is used to predict future outcomes or
events. The objective of predictive modeling is to clarify the relationship between a group of input variables - in this
case, the factors affecting ESP operation - and a target variable - the length of ESP lifespan. The mathematical
computations needed to build and evaluate a model are incredibly complex and should be done with specialized
statistics software. The modeling performed in this exercise was done in SAS Enterprise Miner. Ideally, the
historical data used for predictive modeling should have enough observations to be partitioned into at least two
groups: a training dataset and a validation dataset. The training data is needed to build modeling equations and
should consume about 60% of the available data. The validation dataset is used to assess the accuracy of the
predictive models created with the training dataset and should employ the remaining 40% of available data. The
purpose of having these two groups is to avoid overfitting of the model. This occurs when the predictive model
learns the training dataset so well that it can only accurately predict an outcome for the training data and therefore
cannot be applied to any new data. The model which best predicts outcomes on the validation dataset should be
chosen as the champion model and used on new unseen data, also known as scoring data [11].

URTeC 2017 Page 866


URTeC: 2669988 5

With only 51 ESP failures in total to analyze, the size of the historical dataset proved challenging when choosing a
modeling approach. Unfortunately, there were not enough observations to partition the data into training and
validation datasets, so the AA team was forced to forgo the validation dataset in this phase of the project. The
predictive models appraised in this exercise included: stepwise multiple logistic regression, decision tree, and high
performance random forest (HP Forest). These were not the most complex models available, however they each
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

have relaxed requirements in regards to data size and were most compatible for this dataset. All of the modeling
types have a unique method of decreasing the number of variables under consideration so that only the most relevant
are entered into the final model. Model accuracy was evaluated using average squared error. The average squared
error measures the difference between the predicted value of the target variable and its actual value. In theory, the
perfect model would have an average squared error of 0 meaning it perfectly predicts its target, however in reality
natural variations in data make this impossible, so the best model has the lowest average squared error. Multiple
iterations of each model were created in an attempt to decrease the average squared error and improve the accuracy
of the model. The modeling exercise was repeated with a logarithmic target variable in an attempt to account for the
non-normality of the target variable, however these resulting models proved to be less accurate than those using the
unmodified target variable.

The HP Forest model was found to have the lowest average squared error of all the models considered in the
exercise and was chosen as the champion model. HP Forest is an ensemble model composed of a number of unique
decision tree models. HP Forest models have several advantages over its competitor models including faster model
training, pseudo-validation comparisons, and more accurate results than a single model approach [10]. However, the
main disadvantage of the HP Forest model is that its results can be difficult to explain since the relationships
between the input variables and target variable are convoluted. The champion model had an average residual of 5.06
days and an average root square error of 31.48 days. This means that, on average, the model predicted ESP lifespan
to within approximately five days of the true ESP lifespan and 90% of the model’s predictive error was within +/- 30
days of the true ESP lifespan. It bears noting that the model tended to underpredict on ESP lifespans greater than one
year, but this could be due to a shortage of long-lived ESPs in the training dataset. The predictive modeling exercise
also produced a ranking of the most influential variables in dataset. Many of the top influencers in the model
measured ESP cycling, when the ESP is temporarily shut down and restarted, and completions design. The time
series variables considered in the model were not found to be very influential, however this is most likely due to
their summarized nature. Future iterations of this project will analyze these time series variables by themselves in an
effort to forecast ESP failures.

Figure 4: Predicted vs actual comparison plot for ESP lifespan

URTeC 2017 Page 867


URTeC: 2669988 6

Conclusion
This innovative approach to modeling ESP lifespan was only possible through the effective collaboration between
OE subject matter experts, the Advanced Analytics team, and other key data groups. The results of this exercise
proved that it was possible to apply predictive modeling techniques to artificial lift operation using internal data.
Engaging in the data mining process brought about a number of achievements including the dynamic integration of
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

data siloes, the identification of data improvement opportunities, and the selection of key drivers for ESP lifespan.
Though the model was not operationalized, the OE team gained value from knowing which factors influenced ESP
operational duration. This knowledge will allow them to focus their efforts on improving the most influential ESP
lifespan factors rather than making minimal changes on multiple factors. The predictive model built in this exercise
is currently being validated using data on ESPs that have failed within the last six months. In the future, this model
may be modified to include additional metrics focusing on operating range deviations. In conclusion, this data
mining exercise was instrumental in introducing advanced analytics into the production operations environment. The
collaboration between the OE team and the Advanced Analytics team has grown beyond this project and has
allowed for data science to be more readily accepted by other production teams.

Acknowledgements

The author would like to thank the Society of Petroleum Engineers, American Association of Petroleum Geologists,
and Society of Exploration Geophysicists, as well as the program committee for the opportunity to present the
contents of this paper. Special thanks go to Kathy Ball, Beau Rollins, and the OE team for their review of this
manuscript and continued support.

References

[1] Takacs, G. 2009. Electric Submersible Pumps Manual: Design Operation, and Maintenance. Oxford, UK: Gulf
Professional Publishing/Elsevier.

[2] Lea, J. 2009. Artificial Lift Systems. Presented by Petroskills. Oklahoma City, Oklahoma, 12-15 January.

[3] Lea, J. and Bearden, J. 1999. ESP’s: On and Offshore Problems and Solutions. Paper SPE 52159-MS presented
at SPE Mid-Continent Operations Symposium, Oklahoma City, Oklahoma, 28-31 March.
https://doi.org/10.2118/52159-MS (https://doi.org/10.2118/52159-MS)

[4] Holdaway, K. 2014. Harness Oil and Gas Big Data with Analytics. Hoboken, New Jersey: John Wiley and Sons.

[5] Pragale, R. and Shipp, D. 2016. Investigation of Premature ESP Failures and Oil Field Harmonic Analysis. IEEE
Transactions of Industry Applications PP(99): 1-1. http://dx.doi.org/10.1109/TIA.2016.2608958
(http://dx.doi.org/10.1109/TIA.2016.2608958)

[6] Macary, S., El-Noby, M., Al Latif, M., Awni, I., Mohamed, I, 2003. Downhole Permanent Monitoring Tackles
ESP Problematic Wells. Paper OMC-2003-100 presented at the Offshore Mediterranean Conference and Exhibition,
Ravenna, Italy, 27-28 March.

[7] Kalu-Ulu, T.C., Andrawus, J.A., George, I.P.S, 2011. Modelling System Failures of Electric Submersible Pumps
in Sand Producing Wells. Paper SPE-151011-MS presented at the SPE Nigeria Annual International Conference and
Exhibition, Abuja, Nigeria, 30 July – 3 August. https://doi.org/10.2118/151011-MS
(https://doi.org/10.2118/151011-MS)

[8] Guo, D., Raghavendra, C.S., Yao, K.T., Harding, M., Anvar, A., Patel, A., 2015. Data Driven Approach to
Failure Prediction for Electric Submersible Pump Systems. Paper SPE-174062-MS presented at the SPE Western
Regional Meeting, Garden Grove, California, 27-30 April, https://doi.org/10.2118/174062-MS
(https://doi.org/10.2118/174062-MS)

[9] Gupta, S., Nikolaou, M., Saputelli, L., Bravo, C., 2016. ESP Health Monitoring KPI: A Real-Time Predictive
Analytics Application. Paper SPE 181009-MS presented at SPE Intelligent Energy International Conference and

URTeC 2017 Page 868


URTeC: 2669988 7

Exhibition, Aberdeen, Scotland, 6-8 September. https://doi.org/10.2118/181009-MS


(https://doi.org/10.2118/181009-MS)

[10] Christie, P., Georges, J., Wells, C., 2011. Applied Analytics using SAS Enterprise Miner. Cary, North Carolina:
SAS Institute, Inc.
Downloaded 11/03/17 to 132.236.27.111. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

[11] Rollins, B., Herrin, M., 2015. Finding the Key Drivers of Oil Production through SAS Data Integration and
Analysis. Paper URTEC-2150079-MS presented at the Unconventional Reservoir Technology Conference, San
Antonio, Texas, 20-22 July, https://doi.org/10.15530/URTEC-2015-2150079 (https://doi.org/10.15530/URTEC-
2015-2150079)

URTeC 2017 Page 869

You might also like