Machine Learning Predicts P-Sonic Logs in Oil Wells

Prediction of P-Sonic Log in the Volve Oil Field using Machine Learning ... https://towardsdatascience.com/prediction-of-p-sonic-log-in-the-volve-oil...
Get started Open in app
Follow 592K Followers
HANDS-ON TUTORIALS
Prediction of P-Sonic Log in the Volve Oil Field

using Machine Learning
Step-by-step explained, using Scikit-Learn
Yohanes Nuwara Oct 15, 2020 · 11 min read
Mærsk Inspirer på Volve from Equinor Photo Archive
1 de 21 11/05/2021 10:22 p. m.

In 2018, Norwegian oil company Equinor disclosed a massive subsurface and operation
dataset from their Volve oil field in the North Sea. For two years until now, it has been
good news for all people who are passionate about improving and solving challenges in
the study of oil and gas fields in universities, research institutions, and companies. The
following is an encouraging quote from Jannicke Nilsson, the COO of Equinor.
“Volve is an example of how we searched for every

possibility to extend the field life. Now we want to
share all Volve data to ensure learning and
development of future solutions.”
Volve is an oil field located 200 kilometers west of Stavanger at the southern end of the
Norwegian sector in the North Sea and produced from 2008 until 2016.
When I had a look inside the database that everyone can access through this website, I
saw tremendous treasures inside! I started to pitch some ideas to explore the possibility
of bringing machine learning until I came up with an idea of doing sonic log prediction
with the reason I elaborate in the Motivation section of this article.
I keep this project in my GitHub repository (where you can visit) called volve-machine-
learning.
Overview of the Dataset

In the Volve field open database, there are 24 wells. In this study, only 5 wells are used.
The well names are 15/9-F-11A, 15/9-F-11B, 15/9-F-1A, 15/9-F-1B, and 15/9-F-1C.
2 de 21 11/05/2021 10:22 p. m.
Well-log data represents each rock stratum of the earth (Aliouane et al, 2012)
Each of these wells has what they are known as logs. The logs are certain physical
measurements that represent the properties of each rock strata over the depth. The
following is a list of logs that we will use.
NPHI is the formation porosity, measured in v/v.
RHOB is the formation bulk density, measured in grams per cubic centimeters.
GR is the formation radioactivity content, measured in API.
RT is the formation true resistivity, measured in ohm-meter.
PEF is the formation photoelectric absorption factor, dimensionless.
CALI is the borehole diameter, measured in inch.
DT is the compressional (P-wave) travel time, measured in microseconds per foot.
DTS is the shear (S-wave) travel time, measured in similarly to DT.
These well-log datasets have the file format of LAS 2.0, a specific format for well-logs.
You can find the datasets here.
Motivation
3 wells out of 5 (well 15/9-F-11A, 15/9-F-1A, and 15/9-F-1B) have this complete suite
3 de 21 11/05/2021 10:22 p. m.
of logs, except the 2 others (well 15/9-F-11A and 15/F-1C) don’t have DT and DTS log.
This is the reason why we can use supervised learning to produce the DT log in this
incomplete dataset using regression models.
The 3 datasets that have the DT log is used as the training data, and those that don’t
have is used as the test data. The NPHI, RHOB, GR, RT, PEF, and CALI logs are used as
the features; whereas the DT is the target for prediction. For now, the DTS log will not
be used.
Scikit-Learn logo from GitHub
Through supervised learning, these training data will be trained with some regression
models in Scikit-Learn. Then, the model will be used to produce new DT logs based on
the prediction of the features.
I will discuss the workflow into seven steps. You may also access my IPython notebook
inside my GitHub repo that runs this workflow from start to finish.
Access my IPython notebook
Step 1. Displaying the Well-log Dataset

A Python library called lasio is used to read these LAS datasets. In any formation
evaluation, displaying the well-logs is a routine. The following is the display of one of
the training data, well 15/9-F-1B, produced using Matplotlib. Each plot represents each
log; as we have already discussed, the NPHI, RHOB, GR, RT, PEF, and CALI are the
features, and DT is the target.
4 de 21 11/05/2021 10:22 p. m.
Well-log display of well 15/9-F-1B
The following is the well-log display of well 15/9-F-1C, one of the 2 wells that don’t have
DT log, and hence we will predict to produce a new DT log.
5 de 21 11/05/2021 10:22 p. m.
Well-log display of well 15/9-F-1C
Step 2. Data Preparation

The second step is the most critical part of the workflow as it impacts the whole success
story of prediction. Pandas is useful for data processing.
First, we need to make sure that our entire data does not contain any non-numerical
values (NaNs). One trick to filter our data is to set the minimum and maximum depth
limit so that all the data starts and ends with numerical values. For instance, the
previously displayed well 15/9-F-1B starts from depth at 3,100 m to 3,400 m. Sample
code to do this,
df = df.loc[(df['DEPTH'] >= 3100) & (df['DEPTH'] <= 3400)]
6 de 21 11/05/2021 10:22 p. m.

After that, we check if NaNs exist in our data.
df.isnull().sum()
If it turns all zero, we are safe to go. In our case now, it turns all zero and we are already
clean from NaNs. Unless so, we need to handle the NaN values. This action is extensively
discussed here.
Next, we merge the individual well datasets into two larger single data frames, each
for the training and test datasets.
df = pd.concat([df1, df2, df3])
After we have now training and test data frames, finally we assign the well names. The
reason for doing this is to ease us retrieving any well during our prediction. You can see
inside the notebook on how I assign the well names.
Below is the final data frame for the training data.
Training data frame
7 de 21 11/05/2021 10:22 p. m.

Step 3. Exploratory Data Analysis
Exploratory data analysis (EDA) is crucial to understand our data. Two important things we want
to know is the distribution of each individual feature and the correlation of one feature to
another.
To observe the multivariate distribution, we can use a pair-plot in Seaborn package. The
following is the multivariate distribution of the features and the target.
Pair-plot of the training dataset
At least 3 things we get from the pair-plot. First, we observe how most distributions are
skewed and not ideally Gaussian, especially the RT. However, for machine learning, a
Gaussian or less skewed data is preferred. We can do normalization on this data that will
be discussed next. Second, we can see outliers inside the data. Also in the next part, we
will discuss removing these outliers. Third, we see some data pairs are almost linearly
8 de 21 11/05/2021 10:22 p. m.
(thus highly) correlated, such as NPHI and DT; and inversely correlated, such as
RHOB and DT. A pair-plot tells lots of things.
We look more into correlations among features and target by calculating the
Spearman’s correlation and visualize the results using the heatmap. The following is
the Spearman’s correlation heatmap of our data.
Spearman’s correlation heatmap of the training data
Just focus on the DT row, we obtain the 2 largest correlation between DT and NPHI
(positive correlation of .95) and between DT and RHOB (negative correlation of .79).
This correlation result matches with the ones we see before in the pair-plot. Also, other
data seems to have a high correlation with DT, except CALI. CALI also seems to have
little correlation with other features.
9 de 21 11/05/2021 10:22 p. m.

As a common practice, any feature with very low correlation is excluded for prediction,
hence CALI can be excluded. Here, however, I will keep CALI as a feature.
Step 4. Normalization
We know before from the pair-plot that most distributions seem skewed. In order to
improve our prediction performance, later on, we’d better do a normalization (others
may call as scaling). Normalization is a technique to transform the data (without
changing it) to be better distributed.
Before doing any normalization, I prefer to log transform the resistivity data first.
df['RT'] = np.log10(df['RT'])
Then, I do normalization with the function fit_transform in Scikit-Learn. There are

several normalization techniques; the widely used are standardization (by
transforming with the mean and standard deviation) and min-max (with minimum and
maximum value). After trying all the methods, I find the power transform technique
using Yeo-Johnson method.
After normalization, we can see how the data is now distributed again using pair-plot.
10 de 21 11/05/2021 10:22 p. m.
Pair-plot of training data after power transformation with Yeo-Johnson method
Look how NPHI, DT, and RT are now less skewed and more Gaussian-like. Although
RHOB and GR distribution look multimodal, after normalization it becomes more
centered.
Step 5. Removing Outliers
11 de 21 11/05/2021 10:22 p. m.
Also, we just observed

Get started many outliers inside the data. The presence of outliers can
Open in app
downgrade the prediction performance. Therefore, we do outlier removal.
Scikit-Learn provides several outlier removal methods, such as Isolation Forest,

Minimum Covariance using Elliptic Envelope, Local Outlier Factor, and One-class
Support Vector Machine. Alongside these, the most widely used outlier removal
method, yet basic, is using the standard deviation method. In this method, we specify
the threshold as the min and max values away from the standard deviation. We can
build this ourselves.
threshold = 3
df = df[np.abs(df - df.mean()) <= (threshold * df.std())]
All 5 methods are implemented. There are two ways that I use to compare which method
performs the best outlier removal. One way is to count the data before outliers removed
and the data after outliers removed for each method.
Data counts before and after normalization
From this result, we see that the standard deviation method remove the fewest outliers
(as much as only 302), followed One-class SVM and Minimum Covariance with
relatively fewer outliers, compared to others (>10,000). As you may have already
known, fewer outliers removed is better.
Then, to decide which one is better between standard deviation and One-class SVM, I
produce the box-plots using Pandas for each feature before and after normalization.
Below are the box-plots.
12 de 21 11/05/2021 10:22 p. m.
The key observation is that before and after outlier removal, outliers still exist in the
data within the newly computed summary stats. This (indirectly) is a visual
representation to choose which method is the best.
Now it becomes visible by observing the number of outliers that One-class SVM
performs “cleaner” than the standard deviation method. Although Minimum
Covariance is also clean, still One-class SVM is the winner.
The conclusion is: we use One-class SVM. Again, we produce a pair-plot to observe the
final result of our data.
13 de 21 11/05/2021 10:22 p. m.
Pair-plot of the training data after outliers removed using One-class SVM method
Look how the outliers are now much more reduced. We are all set for machine learning.
Step 5. Prediction! First Attempt

Now comes the main course! In the fifth step, we have not yet actually do the prediction
to our real test data (well 15/F-11B and 15/F-1C that don’t have DT log). Be patient! We
14 de 21 11/05/2021 10:22 p. m.
need to evaluate the performance of each regression model that we use by training the
train data and testing the model to the train data themselves, and then we evaluate
how close the predicted DT log to the true DT log.
In this step, test data = train data

I tried 6 regression models from Scikit-Learn, namely the classic linear regression,
Random Forest, Support Vector Machine, Decision Tree, Gradient Boosting, and
K-Nearest Neighbors regressors.
Keep in mind that we always have to de-normalize our result after we finish the
prediction because we did normalization just earlier. In Scikit-Learn, we do this using an
inverse_transform function.
I use R² and root mean squared error (RMSE) as the scoring metric to measure the
performance of each regression model. The following is the result of the scoring metric
for each regressor.
15 de 21 11/05/2021 10:22 p. m.
Scoring metric for 6 regressors
From the result, we see that the regressors perform very excellent! It is understandable
why the classic linear regressor doesn’t perform as very well as the other regressors
(lowest R² and RMSE). The reason is that not all features are perfectly and linearly
correlated.
We can then display both the true DT log and the predicted DT log to compare the
closeness of both. The following is true vs. predicted DT log for each well using Gradient
Boosting regressor.
16 de 21 11/05/2021 10:22 p. m.
True vs. predicted DT log using Gradient Boosting regressor
In fact, all the regressors that I use are still using the default hyperparameters. For
instance, two of several hyperparameters of Gradient Boosting are the number of
estimators and the maximum depth. The defaults are 100 and 3, respectively.
From the scoring metric of Gradient Boosting, we knew already the R² and RMSE
achieved are around .94 and .22, respectively. We can do the hyperparameter tuning to
determine the best hyperparameter values in order to increase the performance score.
Step 6. Hyperparameter Tuning
17 de 21 11/05/2021 10:22 p. m.
The hyperparameter
Get started tuning on the Gradient Boosting regressor is started by doing a
Open in app
train-test split with a composition of 0.7 train and 0.3 test. And then, through the
produced train and test splits, a grid search with defined searched sets of
hyperparameters and 3-fold cross-validation are done. The following are the searched
parameter grids.
Number of estimators n_estimators : 100 and 1,000
Maximum depth max_depth : 10 and 100
It took ~5 minutes to tune the hyperparameters until it gives a result of 1,000 and 100
for the number of estimators and maximum depth each as the best hyperparameters.
The previous step is then repeated by including the hyperparameters and it prints the
new scoring metric as follows.
Scoring metric of Gradient Boosting after hyperparameter tuning
Both R² and RMSE improve a lot to around .98 and .12! With this result, we are
confident enough to use Gradient Boosting for prediction. The following is the true vs.
predicted DT log plot after hyperparameter tuning.
18 de 21 11/05/2021 10:22 p. m.
True vs. predicted DT log using Gradient Boosting regressor after hyperparameter tuning
Step 7. Compile the Tuned Gradient Boosting Regressor for Final Prediction
Finally, we do the real prediction! Now, compile the Gradient Boosting regressor with
the hyperparameters previously tuned (number of estimators = 1,000 and maximum
depth = 10) to our real test data. Remember that our real test data are the wells that
don’t have DT log; well 15/9-F-11B and 15/9-F-1C, or the so-called well 2 and well 5.
The following are the predicted DT logs in well 2 and well 5.
19 de 21 11/05/2021 10:22 p. m.
The predicted logs in well 15/9-F-11B and well 15/9-F-1C are pleasing! To close our
workflow, we can import the predicted result to the original datasets and write to CSV
20 de 21 11/05/2021 10:22 p. m.
file.
Here is the final CSV result: well 15/9-F-11B and 15/9-F-1C
Conclusion
We have proved the successful utilization of supervised learning to the Volve oil field
open dataset owned by Equinor. With this workflow, new P-sonic (DT) logs have been
predicted on two wells that originally don’t have a DT log using the Gradient Boosting
method.
I hope this article brings new fresh air for any ML practitioners in geoscience to start
exploring other possibili
Sign
other up for The
possibilities thatVariable
I
By Towards Data Science
updates
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.
You'll need to sign in or create an account to receive this

Get this newsletter
newsletter.
Machine Learning Oil And Gas Gradient Boosting Geoscience Hands On Tutorials
About Help Legal
Get the Medium app
21 de 21 11/05/2021 10:22 p. m.

Machine Learning Predicts P-Sonic Logs in Oil Wells

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Predicts P-Sonic Logs in Oil Wells

Uploaded by

Copyright:

Available Formats

Prediction of P-Sonic Log in the Volve Oil Field using Machine Learning ... https://towardsdatascience.com/prediction-of-p-sonic-log-in-the-volve-oil...

Get started Open in app

Follow 592K Followers

Prediction of P-Sonic Log in the Volve Oil Field

Yohanes Nuwara Oct 15, 2020 · 11 min read

Mærsk Inspirer på Volve from Equinor Photo Archive

Get started Open in app

“Volve is an example of how we searched for every

Overview of the Dataset

Get started Open in app

NPHI is the formation porosity, measured in v/v.

GR is the formation radioactivity content, measured in API.

RT is the formation true resistivity, measured in ohm-meter.

PEF is the formation photoelectric absorption factor, dimensionless.

CALI is the borehole diameter, measured in inch.

DT is the compressional (P-wave) travel time, measured in microseconds per foot.

DTS is the shear (S-wave) travel time, measured in similarly to DT.

Scikit-Learn logo from GitHub

Access my IPython notebook

Step 1. Displaying the Well-log Dataset

Get started Open in app

Well-log display of well 15/9-F-1B

Get started Open in app

Well-log display of well 15/9-F-1C

Step 2. Data Preparation

df = df.loc[(df['DEPTH'] >= 3100) & (df['DEPTH'] <= 3400)]

Get started Open in app

df = pd.concat([df1, df2, df3])

Below is the final data frame for the training data.

Training data frame

Get started Open in app

Pair-plot of the training dataset

Spearman’s correlation heatmap of the training data

Get started Open in app

Then, I do normalization with the function fit_transform in Scikit-Learn. There are

Get started Open in app

Pair-plot of training data after power transformation with Yeo-Johnson method

Step 5. Removing Outliers

Also, we just observed

Scikit-Learn provides several outlier removal methods, such as Isolation Forest,

Data counts before and after normalization

Get started Open in app

Get started Open in app

Step 5. Prediction! First Attempt

In this step, test data = train data

Get started Open in app

Scoring metric for 6 regressors

Get started Open in app

True vs. predicted DT log using Gradient Boosting regressor

Step 6. Hyperparameter Tuning

Number of estimators n_estimators : 100 and 1,000

Maximum depth max_depth : 10 and 100

Scoring metric of Gradient Boosting after hyperparameter tuning

Get started Open in app

The following are the predicted DT logs in well 2 and well 5.

Get started Open in app

Here is the final CSV result: well 15/9-F-11B and 15/9-F-1C

You'll need to sign in or create an account to receive this

About Help Legal

Get the Medium app

You might also like