You are on page 1of 22

Partner Logo

Python for Hydrology


Training Course Programme
Alex Rodrigues – Digital Solutions
Architect
Session IV - 01 March 2022

Revision to exercises from session III


Pandas Data Structures: Indexing and Slicing
Joins
Merge and Joins in Python
Date time objects
String Manipulation – An introduction to regular expressions
Exercises
Revision to exercises
from session III
Revision to exercises from session II
Topics covered

​Last session we covered:


 descriptive statistics in pure Python
 descriptive statistics with available Python libraries
 Exploring the data
 Simple linear regression
 Pandas
 Grouping
 functions on columns
 functions on rows
 Visualisations - plotly

Mott MacDonald 3 February 2022


Revision to exercises from session III Practice 01

Some relevant code lines • Consolidate


session III

​General: ​Packages:
 import numpy as np
 .append()  from sklearn import linear_model ​Data manipulation:
 for i in list_numbers:  import statistics  .groupby().mean()
 for j in list_numbers:  import scipy.statsas  . groupby().mean()
 if i!=j:  .fillna(0)
 data=data.dropna()  . fillna(method='ffill’)
 .fillna(method='bfill')
​visualisation:
 .describe()
 barmode='group’
​Linear regression:  .info()
 px.histogram()
 model = linear_model.LinearRegression()  df1[['OMS', "Team"]]
 color='Treatment_Station'
 model.score()  df1.drop(['OMS','Team'], axis=1)
 Model.intercept_  df1[df1['OMS'] >= 5]
 Model.coef_  df1.shape
​ ath & statistics:
M  df1[3:6]
 Model.fit  a%b
 .isin()
 range()
 Len()

Mott MacDonald 3 February 2022


Data Manipulation –
selection in dataframe
Pandas Data Structures: Indexing and Slicing
Pandas Data Structures: Indexing and Slicing

​We have seen many ways to create Series and DataFrames. We also learned how to access and work with complete
columns. Pandas and Python have various ways to access and change values in Pandas DataFrames and Series,
change a single value or values that match a certain criteria.

Practice 02
​Keep in mind • Working with whole
 s_df.loc[] - Refers only to the index labels dataframe. Indexing
 s_df.iloc[] - Refers only to the integer location and Slicing
 s_df.at[] - Access a single value for a row/column label
pair ​Keep in mind
 s_df.iat[] - Access a single value for a row/column pair  The indexing attributes (.loc, .iloc, .ix, .at .iat) can be
by integer position. used to get and set values in the DataFrame.
 The .loc, iloc and .ix indexing attributes can accept
python slice objects. But .at and .iat do not.
 .loc can also accept Boolean Series arguments
 Avoid chaining in the form df[col_indexer][row_indexer]
 Label slices are inclusive, integer slices exclusive
Mott MacDonald 3 February 2022
Joins
Joins
Inner Join

​The Inner join is one of the most common types of join we work with. It returns a dataframe with only those rows
that have common values.

​Keep in mind
 An inner join requires that each row in the two joined ​Inner Join
dataframes to have matching column values. We can
think as an intersection of two sets.

Mott MacDonald 3 February 2022


Joins
Full Join

​The Full Join, also called Full Outer Join, returns all records which either have a match in the left or right
dataframe.

​Keep in mind
 When the rows in both dataframes do not match, the ​Full Join
resulting dataframe will have NaN for every column of the
corresponding dataframe that misses a matching row.

Mott MacDonald 3 February 2022


Joins
Left Join

​The left join combines the columns on a common dimension, returning all rows from the first table with the
matching rows in the second table. The result is NULL in the second table when there is no match.

​Keep in mind
 All the non-matching rows of the left dataframe contain ​Left Join
NaN for the columns in the right dataframe. It is simply
an inner join plus all the non-matching rows of the left
dataframe filled with NaN for columns of the right
dataframe.

Mott MacDonald 3 February 2022


Joins
Right Join

​The Right join, also called as Right Outer Join, is similar to the Left Outer Join.

​Keep in mind
 The only difference is that all the rows of the right ​Right Join
dataframe are taken as it is and only those of the left
dataframe that are common in both

Mott MacDonald 3 February 2022


Merge and Joins in Python
Merge and Joins in Python
merge methods

​Users of relational databases like SQL are familiar with the terminology used to describe join operations between
two tables (DataFrame objects).

​It is important to understand which cases we are dealing with:


 one-to-one: joining two DataFrame objects on their indexes (which must contain unique values).

 many-to-one: joining an index (unique) to one or more columns in a different DataFrame.

 many-to-many joins: joining columns on columns..

Mott MacDonald 3 February 2022


Merge and Joins in Python
merge syntax

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, s


uffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

​Keep in mind
 Inner Join: To keep only rows that match from the data frames, specify the argument how=‘inner’.
 Outer Join or Full outer join: To keep all rows from both data frames, specify how=‘outer’.
 Left Join or Left outer join: To include all the rows of your data frame x and only those from y that match, specify
how=‘left’.
 Right Join or Right outer join:To include all the rows of your data frame y and only those from x that match, specify
how=‘right’.
 on− Columns (names) to join on. Must be found in both the left and right DataFrame objects.
 how – type of join needs to be performed – ‘left’, ‘right’, ‘outer’, ‘inner’, Default is inner join

Mott MacDonald 3 February 2022


Merge and Joins in Python
Concat

​pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None,


verify_integrity=False, sort=None, copy=True)

​Keep in mind
 Defaults to outer join with the option for inner join
 .concat() is a pandas function
 .concat() combines pandas DataFrame vertically or horizontally Practice 03
 Errors when any of the DataFrame contains a duplicate index.
• Merge in Python

Mott MacDonald 3 February 2022


Date time objects- Introduction
Date time objects- Introduction
Using Pandas, we can convert a column (string/object or integer type) to datetime using
the to_datetime() methods. We can also specify the data type when reading the data from
an external source, such as CSV.
​pd.to_datetime(your_date_data, format="Your_datetime_format")

​Keep in mind
 We can convert an Object (string) Column to datetime
 We can convert an integer Column to datetime
 Convert Column to datetime when Reading a CSV File Practice 04
• Miscellaneous

Mott MacDonald 3 February 2022


String Manipulation –
An introduction to
regular expressions
String Manipulation – An introduction to regular expressions
Regular expressions RegExare is a useful language for matching text patterns. The
Python "re" module provides regular expression support.
​import re

​Useful functions
 findall: Returns a list containing all matches
 search: Returns a Match object if there is a match anywhere in the string
 split: Returns a list where the string has been split at each match Practice 05
 sub: Replaces one or many matches with a string • Regular
expressions

Mott MacDonald 3 February 2022


Exercises

Practice 06
• Miscellaneous
Thank you

You might also like