You are on page 1of 40

Application of Python and Data Analytics in Oil and

Gas
Jaiyesh Chahar | jaiyesh0002@gmail.com

Reservoir Engineer(Data Analyst) at Dicelytics Pvt. Ltd. (Dice Technologies LLC)

List of Contents in this Notebook:


1. Basics of Python with oil and gas examples
2. Data Structures: Lists, Dictionaries, Tuples, Sets
3. Numpy
4. Pandas: With Volve Field Production Data
5. Matplotlib
6. Interactive Plotting: Pressure Profile in reservoir by varying Parameters
7. Oil and Gas Mini Projects: Vogel's IPR, Material Balance(Gas, Oil)

Useful Links-
1. Contact me at: https://www.linkedin.com/in/jaiyesh-chahar-9b3642107/ (https://www.linkedin.com
/in/jaiyesh-chahar-9b3642107/)
2. For more of my Projects : https://github.com/jaiyesh (https://github.com/jaiyesh)
3. Playlist of python for oil and gas by Petroleum from Scratch: https://www.youtube.com
/watch?v=UjdPncyGkIs&list=PLLwtZopJNyqYGXEYmt0zezAEuS616rACw (https://www.youtube.com
/watch?v=UjdPncyGkIs&list=PLLwtZopJNyqYGXEYmt0zezAEuS616rACw)
4. Petroleum from Scratch: https://www.linkedin.com/company/petroleum-from-scratch
/?viewAsMember=true (https://www.linkedin.com/company/petroleum-from-scratch
/?viewAsMember=true)

Welcome Everyone!
I am very excited to take you through this exiciting journey of applicaion of python and Data Analystics in
O&G Industry.

My Aim is to take you to a better, confident and super comfortable place as far as Python for Oil and Gas
is concerned.

We will start with basics of Python and then move on to use cases in Industry.

Here goes.
Python from Scratch
In [1]: # is used for commenting out the statement(block, inline comment)
#Starting with print function
print('Hello Guys')
#Escape Sequence \n helps change line.
print('Hello \nPDEU')

Hello Guys
Hello
PDEU

Single values Data Types


1. Integer eg. - 7
2. Floats eg. - 7.0
3. Booleans: EIther True or False
4. Strings: set of characters that can also contain spaces and numbers, intialises by using "__" eg. - "6"

Mathematical Operations
In [2]: # Addition
print(4+5)
print(4.0+5)
#Subtraction
print(5-1)
print(5.0-1)
#Multiplication
print(2*3)
#Division
print(625/10)

9
9.0
4
4.0
6
62.5

In [3]: ## Exponent: using **


print(5**2)

25

In [4]: ## Integer Division using //


print(26//5)

5
In [5]: ## To get remainder using %
print(26%5)

Strings

In [6]: # surrounded by either single quotation marks, or double quotation mark


s

6 : integer

6.0 : Float

"6" : String

In [7]: print('Hey There's You')

File "<ipython-input-7-4b79f5295f0b>", line 1


print('Hey There's You')
^
SyntaxError: invalid syntax

In [8]: print('Hey There\'s You')

Hey There's You

In [9]: print('Petroleum'+'Engineering')

PetroleumEngineering

In [10]: print('Spam'*3)

SpamSpamSpam

In [11]: print(4*3)
print(4*'3')

12
3333

In [12]: type(2)

Out[12]: int

In [13]: type(4*'3')

Out[13]: str

Variables: Storing any value to a name that is know as Variable


'=' sign is used for intialising
In [14]: #This is called initialization.
#The value from RHS gets stored into the variable at LHS.
x = 6
y = 7

In [15]: print(x*2 - y*3)

-9

In [16]: # Single Values are Stored in Variables.


# Variable names cannot start with a number or special character.
# Doesn't have spaces b/w two characters, but Underscore is allowed.
#Ex - 1a is invalid
#Ex - a1 is valid
#Ex- a_1 and _1_a both are valid.

Input

In [17]: porosity = input('Enter the Formation porosity: ')

Enter the Formation porosity: 0.5

In [18]: porosity

Out[18]: '0.5'

In [19]: type('porosity')

Out[19]: str

In [20]: porosity = float(input('Enter the Formation porosity: '))

Enter the Formation porosity: 0.6

In [21]: type(porosity)

Out[21]: float

In [23]: permeability = float(input(' Enter the formation\'s permeability(md):


'))

Enter the formation's permeability(md): 35

In [24]: print(f'Formation porosity is {porosity} and permeability is {permeabil


ity}')

Formation porosity is 0.6 and permeability is 35.0

In [25]: print('Formation porosity is {} and permeability is {}'.format(porosit


y,permeability))

Formation porosity is 0.6 and permeability is 35.0


In [26]: print('Formation porosity is', porosity, 'and permeability is', permeab
ility )

Formation porosity is 0.6 and permeability is 35.0

Booleans and Comparison

In [27]: print(2!= 3)

True

In [28]: print(2 == 3 )

False

In [29]: print(2 = 3 )

File "<ipython-input-29-2981d28b369a>", line 1


print(2 = 3 )
^
SyntaxError: keyword can't be an expression

In [30]: print(5>3 or 3>5 )

True

In [31]: print(5>3 and 3>5 )

False

If-else: To run a code only if certain condition holds true

In [32]: Reservoir_Pressure = float(input('Enter the reservoir pressure(psi):


'))
Hydrostatic_pressure = float(input('Enter the Hydrostatic pressure of m
ud(psi): '))
Fracture_pressure = float(input('Enter the Fracture pressure of rock(ps
i): '))

Enter the reservoir pressure(psi): 1000


Enter the Hydrostatic pressure of mud(psi): 1200
Enter the Fracture pressure of rock(psi): 1300

In [33]: if Hydrostatic_pressure > Reservoir_Pressure and Hydrostatic_pressure <


Fracture_pressure:
print('Safe Zone')
elif Hydrostatic_pressure > Reservoir_Pressure and Hydrostatic_pressure
> Fracture_pressure:
print('Risk of formation Fracture')
else:
print('Risk of kick')

Safe Zone
While Loops : To repeat a block of code again and again; until the condition
satisfies
The code in body of while loop is executed repeatedly. This is called Iterations

In [35]: i = 1
while i<=5:

print(i)
i = i+1

print('Finished')

1
2
3
4
5
Finished

In [36]: ## Can be used to stop iteration after a specific input

In [37]: Password = input('Enter the Password: ')

Enter the Password: we

In [38]: while Password!= 'abcd':


Password = input('Enter the Password: ')
print('Wrong Password Enter Again')
print('Access Granted')

Enter the Password: ww


Wrong Password Enter Again
Enter the Password: dcc
Wrong Password Enter Again
Enter the Password: abcd
Wrong Password Enter Again
Access Granted

break: for breaking the while loop prematurely


We can break an infinite loop if some condition is satisfied
In [39]: i = 0
while True: #while True is an easy way to make an infinite loop
print(i)
i = i+1
if i > 5:
print('Breaking')
break
print('Finished')

0
1
2
3
4
5
Breaking
Finished

Continue : to jump back to top of the while loop, rather than stopping it.
Stops the current iteration and continue with the next one.

In [40]: i = 0
while i <= 5:
i = i+1
if i ==3:
print('SKipping 3')
continue
print(i)

1
2
SKipping 3
4
5
6

Multi Value Data Types and Sequences

Lists
Used to store items

Square Brackets are used []

Mutable: Can Change length, elements, elements values

In [41]: porosity = [0.3,0.41,0.51,0.1]


In [42]: #indexing
porosity[3]

Out[42]: 0.1

In [43]: # empty lists are used heavily to populate it later during the program
empty = []
i = 5
while i < 10:
empty.append(i)
i = i+1
empty

Out[43]: [5, 6, 7, 8, 9]

In [44]: #STrings are also like a list of Characters, so indexing operators are
also used on strings.
String = 'Petroleum'
String[1]

Out[44]: 'e'

In [45]: #List operations


#itmens at certain index can be reassigned:
a = [7,2,9,8,10]
a[1] = 999
a

Out[45]: [7, 999, 9, 8, 10]

In [46]: #addition(concating lists)


a =[1,2,3]
b =[4,5,6]
c = a+b
c

Out[46]: [1, 2, 3, 4, 5, 6]

In [47]: #Multiply
a*3

Out[47]: [1, 2, 3, 1, 2, 3, 1, 2, 3]

In [48]: # in operator
1 in a

Out[48]: True

In [49]: #not operator


4 not in a

Out[49]: True
In [50]: #List Functions
#append: adding an item to the end of an existing list
a = [1,2,3]
a.append(4)
a

Out[50]: [1, 2, 3, 4]

In [51]: #insert: Like append but we can insert a new item at any position in li
st.
a = [1,2,3,4,5,6,7]
a.insert(4,'PETROLEUM')
a

Out[51]: [1, 2, 3, 4, 'PETROLEUM', 5, 6, 7]

In [52]: #Slicing. Just like the name says.


#It helps cut a slice (sub-part) off a List.
superset = [1,2,3,4,5,6,7,8,9]

#Slicing syntax - listname[start:stop:step] start is included. stop is


not
subset1 = superset[0:5:2]

print(subset1)

subset2 = superset[:]

print(subset2) #Skipping a part also works for first and end indices.

subset3 = superset[:-1] #starting to ending -1


print(subset3)

subset4 = superset[:] #everything


print(subset4)

[1, 3, 5]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [53]: #Reversing the list


reverse_set = superset[-1::-2]

#Start at -1 th index, end at 0th, take negative steps.

print(reverse_set)

[9, 7, 5, 3, 1]
In [54]: #Same can be applied to strings
name = 'Petroleum_Engineering'

print(name[0])

print(name[-1])

print(name[0:5])

print(name[::2])

print(name[-1::-1])

P
g
Petro
PtoemEgneig
gnireenignE_muelorteP

Tuples
A tuple is a collection which is ordered and unchangeable

parenthesis are used ()

Immutable

In [55]: a = (1,2,3,4,'Hello')

In [56]: #can be accessed with index


a[4]

Out[56]: 'Hello'

In [57]: a[4] = 7

---------------------------------------------------------------------
------
TypeError Traceback (most recent call
last)
<ipython-input-57-cb3ce5dc8467> in <module>
----> 1 a[4] = 7

TypeError: 'tuple' object does not support item assignment

Dictionaries
Helps store data with labels

Syntax => { key : value }

Has no order

Can be directly converted to DataFrames (tables)


In [58]: rock_properties = {'poro' : 0.25, 'perm' : 150 , 'lithology' : 'Limesto
ne'}

print(rock_properties)

{'poro': 0.25, 'perm': 150, 'lithology': 'Limestone'}

In [59]: rock_properties['poro']

Out[59]: 0.25

In [60]: rock_properties['lithology'] = 'Shale'

In [61]: rock_properties

Out[61]: {'poro': 0.25, 'perm': 150, 'lithology': 'Shale'}

Sets
Curly braces are used just like dictionaries

Unordered: Can't be indexed

Can't contain duplicate values

Faster Than List

In [62]: a = {1,2,3,4,5,6,7,1,2,2}

In [63]: a

Out[63]: {1, 2, 3, 4, 5, 6, 7}

Summary of Data Structures


1. Dictionary - Key:Value, mutable
2. Lists - Mutable, Empty lists are used heavily to populate it later during the program
3. Set - Uniqueness of Elements
4. Tuples - Data cannot be changed

for loops
The tool with which we can utilize the power of computers

We can perform repetition of a command a 1000 times in 1 second.

Iterations are always performed on Iterables (ordered-collections).

Examples of iterables - lists, strings etc


In [64]: words = ['Hello', 'people','of', 'PDEU']

In [65]: for i in words:


print(i)

Hello
people
of
PDEU

In [66]: Specific_gravity = [0.2,0.3,0.4,0.87,0.9,1]


for i in Specific_gravity:
api = (141.5/i) - 131.5
print('API gravity corresponding to Specific Gravity', i, 'is', ap
i)

API gravity corresponding to Specific Gravity 0.2 is 576.0


API gravity corresponding to Specific Gravity 0.3 is 340.166666666666
7
API gravity corresponding to Specific Gravity 0.4 is 222.25
API gravity corresponding to Specific Gravity 0.87 is 31.143678160919
535
API gravity corresponding to Specific Gravity 0.9 is 25.7222222222222
3
API gravity corresponding to Specific Gravity 1 is 10.0

Functions
Instead of writing code again and again we can create a function for different values, we can write a function
and call that whenever we want to do the calculations

In [67]: def Function():


print('Use of Function')

In [68]: #Calling Function


Function()

Use of Function

In [69]: #Returning from a function


def add(x,y):
return x+y

In [70]: add(2,3)

Out[70]: 5

In [71]: #Once we return from a function, it stops being executed, any code writ
en after the return will never be executed
def f(x,y,z):
return x/y +z
print('Hello')
In [72]: f(4,2,4)

Out[72]: 6.0

In [73]: def api(x):


api = 141.5/x - 131.5
print('The api gravity is',round(api))

In [74]: api(0.9)

The api gravity is 26

Lambda Function
Single line function

In [75]: api_lambda = lambda x : 141.5/x - 131.5

In [76]: api_lambda(0.9)

Out[76]: 25.72222222222223

List Comprehensions
Quickly creating lists whose contents obeys a simple rule

In [77]: cubes = [i**3 for i in range(10)]


cubes

Out[77]: [0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

In [78]: even_square = [i**2 for i in range(10) if i%2 ==0]


even_square

Out[78]: [0, 4, 16, 36, 64]

End of Day 1 session


Numpy
1. NumPy stands for Numerical Python. It is a Linear Algebra Library for Python.
2. Numpy is also incredibly fast, as it has bindings to C libraries. So, NumPy operations help in
computational efficiency.
3. NumPy is famous for its object - NumPy arrays. Which helps store collection (just like list) of numbers in
form of an object that can be treated and manipulated just like a number.

In [281]: #importing numpy


import numpy as np

Numpy Arrays

In [282]: #Creating Numpy array from a python list


a=[1,2,3,4,5]
arr = np.array(a)
arr

Out[282]: array([1, 2, 3, 4, 5])

In [283]: type(arr)

Out[283]: numpy.ndarray

In [284]: type(a)

Out[284]: list

In [214]: a=[1,2,3,4,5]
b=[4,5,6,7,8]
arra = np.array(a)
arrb = np.array(b)
print(a+b) #Concatation of lists not addition
print(arra+arrb)#addition of elements of array

[1, 2, 3, 4, 5, 4, 5, 6, 7, 8]
[ 5 7 9 11 13]

List cannot directly handle arithmetic operations while array can.

In [285]: #Multidimensional Arrays(Matrix): Passing lists of list in np.array


My_Matrix = [[1,2,3],[4,5,6],[7,8,9]]
m = np.array(My_Matrix)
m

Out[285]: array([[1, 2, 3],


[4, 5, 6],
[7, 8, 9]])
In [286]: #Shape sttribute to find the shape of array
m.shape

Out[286]: (3, 3)

In [217]: #Using Built in Methods for generating array


Pressures = np.arange(0,5500,500) #Making array of pressure from 0 to
5000 psi with a step size of 500 psi
Pressures

Out[217]: array([ 0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 50
00])

In [218]: #Linspace: Evenly spaced numbers over a specified interval


saturations = np.linspace(0,1,100)

#Both start and stop values are included as the first and last values
of the array.
#Creates saturation array with 100 values: starting from 0-100

saturations

Out[218]: array([0. , 0.01010101, 0.02020202, 0.03030303, 0.04040404,


0.05050505, 0.06060606, 0.07070707, 0.08080808, 0.09090909,
0.1010101 , 0.11111111, 0.12121212, 0.13131313, 0.14141414,
0.15151515, 0.16161616, 0.17171717, 0.18181818, 0.19191919,
0.2020202 , 0.21212121, 0.22222222, 0.23232323, 0.24242424,
0.25252525, 0.26262626, 0.27272727, 0.28282828, 0.29292929,
0.3030303 , 0.31313131, 0.32323232, 0.33333333, 0.34343434,
0.35353535, 0.36363636, 0.37373737, 0.38383838, 0.39393939,
0.4040404 , 0.41414141, 0.42424242, 0.43434343, 0.44444444,
0.45454545, 0.46464646, 0.47474747, 0.48484848, 0.49494949,
0.50505051, 0.51515152, 0.52525253, 0.53535354, 0.54545455,
0.55555556, 0.56565657, 0.57575758, 0.58585859, 0.5959596 ,
0.60606061, 0.61616162, 0.62626263, 0.63636364, 0.64646465,
0.65656566, 0.66666667, 0.67676768, 0.68686869, 0.6969697 ,
0.70707071, 0.71717172, 0.72727273, 0.73737374, 0.74747475,
0.75757576, 0.76767677, 0.77777778, 0.78787879, 0.7979798 ,
0.80808081, 0.81818182, 0.82828283, 0.83838384, 0.84848485,
0.85858586, 0.86868687, 0.87878788, 0.88888889, 0.8989899 ,
0.90909091, 0.91919192, 0.92929293, 0.93939394, 0.94949495,
0.95959596, 0.96969697, 0.97979798, 0.98989899, 1. ])

In [287]: #np.zeros
z = np.zeros(3)
z

Out[287]: array([0., 0., 0.])

In [288]: zm = np.zeros((3,3))
zm

Out[288]: array([[0., 0., 0.],


[0., 0., 0.],
[0., 0., 0.]])
In [289]: #np.ones
o = np.ones(3)
o

Out[289]: array([1., 1., 1.])

In [290]: mo = np.ones((3,2))
mo

Out[290]: array([[1., 1.],


[1., 1.],
[1., 1.]])

In [291]: #eye: Creates an identity matrix


e = np.eye(3)
e

Out[291]: array([[1., 0., 0.],


[0., 1., 0.],
[0., 0., 1.]])

In [297]: #Random: Numpy also has lots of ways to create random number arrays:
#Rand : Create an array of the given shape and populate it with random
samples from a uniform distribution over [0, 1)
rand = np.random.rand(2) #shape input
rand

Out[297]: array([0.79735451, 0.15014681])

In [298]: np.random.rand(5,5)

Out[298]: array([[0.99594475, 0.8458881 , 0.08822451, 0.16634309, 0.54486434],


[0.54400554, 0.44326835, 0.77324264, 0.51943594, 0.33831159],
[0.44580354, 0.57897933, 0.83165121, 0.44414091, 0.27359894],
[0.08399844, 0.18347876, 0.68942538, 0.99359697, 0.14182709],
[0.37589459, 0.12811173, 0.17572611, 0.11711842, 0.7414471 ]])

In [299]: #randn : Return a sample (or samples) from the "standard normal" distr
ibution(mean =0,standard deviation=1). Unlike rand which is uniform.
np.random.randn(2)

Out[299]: array([ 1.44380265, -0.44596587])

In [300]: np.random.randn(5,5)

Out[300]: array([[ 0.04106163, -1.02717883, 0.7577701 , 0.87903755, -1.431851


31],
[ 0.91525872, 0.86849532, -0.22200836, -1.71855818, -1.385504
28],
[-0.46713366, -0.41038961, 0.57663796, -0.09966134, 0.274207
21],
[-1.76228958, 0.10605052, -1.6727492 , -0.4981573 , -0.552070
81],
[ 0.15349532, 0.20697246, 1.15074355, 0.69342804, 0.224691
4 ]])
In [306]: #randint : Return random integers from low (inclusive) to high (exclus
ive).
a = np.random.randint(1,100,10)
a

Out[306]: array([ 4, 28, 29, 8, 64, 62, 38, 46, 51, 85])

In [307]: np.random.randint(1,100,10)

Out[307]: array([67, 97, 43, 19, 78, 94, 44, 67, 46, 8])

In [308]: #np.random.normal : Return random values sample from 'normal' distribu


tion
#s = np.random.normal(mean, std, no. of points)
poro = np.abs(np.random.normal(0.25,0.01,20))
poro

Out[308]: array([0.24891562, 0.24185109, 0.24929802, 0.24452465, 0.23943791,


0.23800566, 0.24806901, 0.25782996, 0.26587456, 0.24491005,
0.25495622, 0.24106408, 0.25900647, 0.24896522, 0.23589043,
0.2357999 , 0.25466743, 0.23189635, 0.25698376, 0.23932708])

In [231]: #import matplotlib.pyplot as plt


#poro = np.random.normal(0.25,0.01,20)
#count, bins, ignored = plt.hist(poro, 30, density=True)

#plt.plot(bins, 1/(0.01 * np.sqrt(2 * np.pi)) *

# np.exp( - (bins - 0.25)**2 / (2 * 0.01**2) ),

# linewidth=2, color='r')

#plt.show()

In [309]: #Operations on Arrays


a =np.array([1,2,3,4])
b =np.array([4,5,6,7])
a+b

Out[309]: array([ 5, 7, 9, 11])

In [310]: a*b

Out[310]: array([ 4, 10, 18, 28])

In [311]: a/b

Out[311]: array([0.25 , 0.4 , 0.5 , 0.57142857])

In [312]: a**b

Out[312]: array([ 1, 32, 729, 16384], dtype=int32)

In [313]: #Dot Product


a.dot(b)

Out[313]: 60
In [314]: #len function
len(a)

Out[314]: 4

In [315]: z = np.array([a,a**b])
z

Out[315]: array([[ 1, 2, 3, 4],


[ 1, 32, 729, 16384]])

In [316]: len(z) #number of rows

Out[316]: 2

In [317]: z.T #Transpose

Out[317]: array([[ 1, 1],


[ 2, 32],
[ 3, 729],
[ 4, 16384]])

In [318]: len(z.T)

Out[318]: 4

In [319]: #dtype : to see the datatype of elements in array


z.dtype

Out[319]: dtype('int32')

In [320]: #astype: TO cast to a specific datatype


z.astype(float)

Out[320]: array([[1.0000e+00, 2.0000e+00, 3.0000e+00, 4.0000e+00],


[1.0000e+00, 3.2000e+01, 7.2900e+02, 1.6384e+04]])

In [325]: #Maths Functions for min,max,mean,std values


poro = np.abs(np.random.normal(0.25,0.01,2000))
poro

Out[325]: array([0.23618817, 0.25324195, 0.24200402, ..., 0.24714725, 0.2417012


7,
0.24741424])

In [326]: poro.max()

Out[326]: 0.2943737318970742

In [327]: poro.min()

Out[327]: 0.21559888678906225

In [328]: poro.mean()

Out[328]: 0.24987956068441375
In [329]: poro.std()

Out[329]: 0.010131068219927063

In [330]: #Numpy Universal Array Functions - Numpy comes with many universal arr
ay functions
#which are essentially just mathematical operations you can use to per
form the operation across the array

arr = np.random.randint(1,100,10)
arr

Out[330]: array([ 4, 43, 70, 49, 64, 26, 21, 12, 72, 93])

In [331]: #Taking Square Roots


np.sqrt(arr)

Out[331]: array([2. , 6.55743852, 8.36660027, 7. , 8. ,


5.09901951, 4.58257569, 3.46410162, 8.48528137, 9.64365076])

In [332]: #Calcualting exponential (e^)


np.exp(arr)

Out[332]: array([5.45981500e+01, 4.72783947e+18, 2.51543867e+30, 1.90734657e+2


1,
6.23514908e+27, 1.95729609e+11, 1.31881573e+09, 1.62754791e+0
5,
1.85867175e+31, 2.45124554e+40])

In [333]: np.sin(arr)

Out[333]: array([-0.7568025 , -0.83177474, 0.77389068, -0.95375265, 0.9200260


4,
0.76255845, 0.83665564, -0.53657292, 0.25382336, -0.9482821
4])

In [334]: np.log(arr)

Out[334]: array([1.38629436, 3.76120012, 4.24849524, 3.8918203 , 4.15888308,


3.25809654, 3.04452244, 2.48490665, 4.27666612, 4.53259949])

Pandas
Ms Excel of Python but powerful This library helps us import | create | work with data in the form of tables.

The tables are called DataFrames.

1. We can directly convert a Dictionary into a DataFrame.


2. We can import excel-sheets or CSV files (most popular) into DF.
3. We can manipulate and use these tables in a user-friendly way.
In [335]: #Converting Dictionary into DataFrame
#Step 1: Import Pandas with an alias 'pd'
import pandas as pd

#Step 2: Create your dictionary


Rock_Properties = {'phi': [0.2,0.40,0.30,0.25,0.270],
'perm': [100,20,150,130,145],
'lith': ['sandstone','shale','limestone','limestone','
sandstone']}

#Step 3: Create your Table.


rock_table = pd.DataFrame(Rock_Properties)

#Step 4: Print your table.


rock_table

Out[335]:
phi perm lith

0 0.20 100 sandstone

1 0.40 20 shale

2 0.30 150 limestone

3 0.25 130 limestone

4 0.27 145 sandstone

In [336]: #Adding New Column


rock_table['Saturation'] = [0.14,0.25,0.45,0.37,0.28]
rock_table

Out[336]:
phi perm lith Saturation

0 0.20 100 sandstone 0.14

1 0.40 20 shale 0.25

2 0.30 150 limestone 0.45

3 0.25 130 limestone 0.37

4 0.27 145 sandstone 0.28

In [337]: #Importing from csv or excel files

In [338]: volve = pd.read_csv('vpd.csv')


#Similarly excel file can be read by-
#df = pd.read_excel('\path\filename.csv')
In [343]: volve.head(10)

Out[343]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_PRE

0 07-Apr-14 7405 15/9-F-1 C 0.0

1 08-Apr-14 7405 15/9-F-1 C 0.0

2 09-Apr-14 7405 15/9-F-1 C 0.0

3 10-Apr-14 7405 15/9-F-1 C 0.0

4 11-Apr-14 7405 15/9-F-1 C 0.0

5 12-Apr-14 7405 15/9-F-1 C 0.0

6 13-Apr-14 7405 15/9-F-1 C 0.0

7 14-Apr-14 7405 15/9-F-1 C 0.0

8 15-Apr-14 7405 15/9-F-1 C 0.0

9 16-Apr-14 7405 15/9-F-1 C 0.0

In [354]: volve.describe()

Out[354]:
NPD_WELL_BORE_CODE ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPE

count 15634.000000 15349.000000 8980.000000

mean 5908.581745 19.994172 181.803870

std 649.231622 8.369911 109.712365

min 5351.000000 0.000000 0.000000

25% 5599.000000 24.000000 0.000000

50% 5693.000000 24.000000 232.897000

75% 5769.000000 24.000000 255.401250

max 7405.000000 25.000000 397.589000

In [346]: #shape
volve.shape

Out[346]: (15634, 19)

In [347]: #columns to get output of columns name


volve.columns

Out[347]: Index(['DATEPRD', 'NPD_WELL_BORE_CODE', 'NPD_WELL_BORE_NAME', 'ON_STR


EAM_HRS',
'AVG_DOWNHOLE_PRESSURE', 'AVG_DOWNHOLE_TEMPERATURE', 'AVG_DP_T
UBING',
'AVG_ANNULUS_PRESS', 'AVG_CHOKE_SIZE_P', 'AVG_CHOKE_UOM', 'AVG
_WHP_P',
'AVG_WHT_P', 'DP_CHOKE_SIZE', 'BORE_OIL_VOL', 'BORE_GAS_VOL',
'BORE_WAT_VOL', 'BORE_WI_VOL', 'FLOW_KIND', 'WELL_TYPE'],
dtype='object')
In [349]: print(volve['NPD_WELL_BORE_NAME'].value_counts())

15/9-F-4 3327
15/9-F-5 3306
15/9-F-14 3056
15/9-F-12 3056
15/9-F-11 1165
15/9-F-15 D 978
15/9-F-1 C 746
Name: NPD_WELL_BORE_NAME, dtype: int64

In [350]: volve.groupby(['NPD_WELL_BORE_NAME']).agg({'NPD_WELL_BORE_NAME':'count
'})

Out[350]:
NPD_WELL_BORE_NAME

NPD_WELL_BORE_NAME

15/9-F-1 C 746

15/9-F-11 1165

15/9-F-12 3056

15/9-F-14 3056

15/9-F-15 D 978

15/9-F-4 3327

15/9-F-5 3306

In [352]: #Conditional Dataframe Slicing


pf12 = volve['NPD_WELL_BORE_NAME'] == '15/9-F-12' #Give Boolean
volve_pf12 = volve[pf12]

In [353]: volve_pf12.head()

Out[353]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_

1911 12-Feb-08 5599 15/9-F-12 11.50

1912 13-Feb-08 5599 15/9-F-12 24.00

1913 14-Feb-08 5599 15/9-F-12 22.50

1914 15-Feb-08 5599 15/9-F-12 23.15

1915 16-Feb-08 5599 15/9-F-12 24.00


In [265]: #Statistical description of Dataset
volve_pf12.describe()

Out[265]:
NPD_WELL_BORE_CODE ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPE

count 3056.0 3056.000000 3050.000000

mean 5599.0 21.336489 80.729069

std 0.0 6.889030 120.086898

min 5599.0 0.000000 0.000000

25% 5599.0 24.000000 0.000000

50% 5599.0 24.000000 0.000000

75% 5599.0 24.000000 239.423000

max 5599.0 25.000000 317.701000

In [355]: #info: Information of datatypes and count of null values


volve_pf12.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3056 entries, 1911 to 4966
Data columns (total 19 columns):
DATEPRD 3056 non-null object
NPD_WELL_BORE_CODE 3056 non-null int64
NPD_WELL_BORE_NAME 3056 non-null object
ON_STREAM_HRS 3056 non-null float64
AVG_DOWNHOLE_PRESSURE 3050 non-null float64
AVG_DOWNHOLE_TEMPERATURE 3050 non-null float64
AVG_DP_TUBING 3050 non-null float64
AVG_ANNULUS_PRESS 3043 non-null float64
AVG_CHOKE_SIZE_P 3012 non-null float64
AVG_CHOKE_UOM 3056 non-null object
AVG_WHP_P 3056 non-null float64
AVG_WHT_P 3056 non-null float64
DP_CHOKE_SIZE 3056 non-null float64
BORE_OIL_VOL 3056 non-null float64
BORE_GAS_VOL 3056 non-null float64
BORE_WAT_VOL 3056 non-null float64
BORE_WI_VOL 0 non-null float64
FLOW_KIND 3056 non-null object
WELL_TYPE 3056 non-null object
dtypes: float64(13), int64(1), object(5)
memory usage: 477.5+ KB
In [267]: import seaborn as sns
sns.heatmap(volve_pf12.isnull())

Out[267]: <matplotlib.axes._subplots.AxesSubplot at 0x27e4bb9bcc8>


In [356]: #Dropping Coloumns
volve_pf12.drop(['NPD_WELL_BORE_CODE','BORE_WI_VOL','NPD_WELL_BORE_NAM
E'],axis = 1, inplace = True)
volve_pf12

Out[356]:
DATEPRD ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPERATURE

1911 12-Feb-08 11.50 308.056 104.418

1912 13-Feb-08 24.00 303.034 105.403

1913 14-Feb-08 22.50 295.586 105.775

1914 15-Feb-08 23.15 297.663 105.752

1915 16-Feb-08 24.00 295.936 105.811

... ... ... ...

4962 13-Sep-16 0.00 0.000

4963 14-Sep-16 0.00 0.000

4964 15-Sep-16 0.00 0.000

4965 16-Sep-16 0.00 0.000

4966 17-Sep-16 0.00 0.000

3056 rows × 16 columns

In [357]: volve_pf12.set_index('DATEPRD',inplace = True)

In [358]: volve_pf12.head()

Out[358]:
ON_STREAM_HRS AVG_DOWNHOLE_PRESSURE AVG_DOWNHOLE_TEMPERATURE

DATEPRD

12-Feb-08 11.50 308.056 104.418

13-Feb-08 24.00 303.034 105.403

14-Feb-08 22.50 295.586 105.775

15-Feb-08 23.15 297.663 105.752

16-Feb-08 24.00 295.936 105.811

In [360]: volve_pf12['AVG_DOWNHOLE_PRESSURE']

Out[360]: DATEPRD
12-Feb-08 308.056
13-Feb-08 303.034
14-Feb-08 295.586
15-Feb-08 297.663
16-Feb-08 295.936
...
13-Sep-16 0.000
14-Sep-16 0.000
15-Sep-16 0.000
16-Sep-16 0.000
17-Sep-16 0.000
Name: AVG_DOWNHOLE_PRESSURE, Length: 3056, dtype: float64
In [361]: volve_pf12[['AVG_DOWNHOLE_PRESSURE']]

Out[361]:
AVG_DOWNHOLE_PRESSURE

DATEPRD

12-Feb-08 308.056

13-Feb-08 303.034

14-Feb-08 295.586

15-Feb-08 297.663

16-Feb-08 295.936

... ...

13-Sep-16 0.000

14-Sep-16 0.000

15-Sep-16 0.000

16-Sep-16 0.000

17-Sep-16 0.000

3056 rows × 1 columns

In [362]: a =volve_pf12[['AVG_DOWNHOLE_PRESSURE','BORE_OIL_VOL']]

In [363]: a

Out[363]:
AVG_DOWNHOLE_PRESSURE BORE_OIL_VOL

DATEPRD

12-Feb-08 308.056 285.0

13-Feb-08 303.034 1870.0

14-Feb-08 295.586 3124.0

15-Feb-08 297.663 2608.0

16-Feb-08 295.936 3052.0

... ... ...

13-Sep-16 0.000 0.0

14-Sep-16 0.000 0.0

15-Sep-16 0.000 0.0

16-Sep-16 0.000 0.0

17-Sep-16 0.000 0.0

3056 rows × 2 columns


In [364]: #index number
volve_pf12.iloc[2]

Out[364]: ON_STREAM_HRS 22.5


AVG_DOWNHOLE_PRESSURE 295.586
AVG_DOWNHOLE_TEMPERATURE 105.775
AVG_DP_TUBING 181.868
AVG_ANNULUS_PRESS 12.66
AVG_CHOKE_SIZE_P 31.25
AVG_CHOKE_UOM %
AVG_WHP_P 113.718
AVG_WHT_P 72.738
DP_CHOKE_SIZE 80.12
BORE_OIL_VOL 3124
BORE_GAS_VOL 509955
BORE_WAT_VOL 1
FLOW_KIND production
WELL_TYPE OP
Name: 14-Feb-08, dtype: object

In [365]: #index name


volve_pf12.loc['14-Feb-08']

Out[365]: ON_STREAM_HRS 22.5


AVG_DOWNHOLE_PRESSURE 295.586
AVG_DOWNHOLE_TEMPERATURE 105.775
AVG_DP_TUBING 181.868
AVG_ANNULUS_PRESS 12.66
AVG_CHOKE_SIZE_P 31.25
AVG_CHOKE_UOM %
AVG_WHP_P 113.718
AVG_WHT_P 72.738
DP_CHOKE_SIZE 80.12
BORE_OIL_VOL 3124
BORE_GAS_VOL 509955
BORE_WAT_VOL 1
FLOW_KIND production
WELL_TYPE OP
Name: 14-Feb-08, dtype: object
In [276]: #Plotting the values with value of date on x axis
#inbuilt plot function of pandas
volve_pf12.plot(figsize = (12,10),subplots = True)

Out[276]: array([<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4BC


C6A88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D5
41F48>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4BC
DCEC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D5
B0308>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D5
DEEC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D6
1D0C8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D6
50FC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D8
E9F88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D8
F3B08>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D9
2CB88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D9
91F08>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000027E4D9
CBE88>],
dtype=object)
Visualization
Library = MatplotLib

In [366]: #Step 1: Import the library(s).


import matplotlib.pyplot as plt

#Step 2: create numpy arrays for x and y.


x = np.linspace(-10,10)
y = x**2

#Step 3: Plot now.


plt.plot(x,y)

Out[366]: [<matplotlib.lines.Line2D at 0x27e4e647108>]


In [368]: #Customization
plt.style.use('dark_background')
plt.figure(figsize=(6,4)) #6X6 canvas.
# plt.style.use('default')

# 1. generate the plot. Add a label.


plt.plot(x,y,label='It is a parabola')

#2. Set x axis label


plt.xlabel('This is X-Axis')

#3. Set y axis label.


plt.ylabel('This is Y-Axis')

#4. Set the title.


plt.title('TITLE here.')

#5. set the grid.


plt.grid(True)

#6. display the label in a legend.


plt.legend(loc='best')

Out[368]: <matplotlib.legend.Legend at 0x27e4e9c9ac8>

In [279]: print(plt.style.available)

['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggp


lot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-d
ark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'se
aborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel',
'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', '
seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind
10', '_classic_test']
In [369]: plt.figure(figsize=(16,9))
plt.plot(pd.to_datetime(volve_pf12.index),volve_pf12['BORE_OIL_VOL'])
plt.xlabel('Time')
plt.ylabel('Oil Production')
plt.title('Oil Production vs Time')

Out[369]: Text(0.5, 1.0, 'Oil Production vs Time')

End of Day-2 Session

You might also like