Professional Documents
Culture Documents
General: Packages:
import numpy as np
.append() from sklearn import linear_model Data manipulation:
for i in list_numbers: import statistics .groupby().mean()
for j in list_numbers: import scipy.statsas . groupby().mean()
if i!=j: .fillna(0)
data=data.dropna() . fillna(method='ffill’)
.fillna(method='bfill')
visualisation:
.describe()
barmode='group’
Linear regression: .info()
px.histogram()
model = linear_model.LinearRegression() df1[['OMS', "Team"]]
color='Treatment_Station'
model.score() df1.drop(['OMS','Team'], axis=1)
Model.intercept_ df1[df1['OMS'] >= 5]
Model.coef_ df1.shape
ath & statistics:
M df1[3:6]
Model.fit a%b
.isin()
range()
Len()
We have seen many ways to create Series and DataFrames. We also learned how to access and work with complete
columns. Pandas and Python have various ways to access and change values in Pandas DataFrames and Series,
change a single value or values that match a certain criteria.
Practice 02
Keep in mind • Working with whole
s_df.loc[] - Refers only to the index labels dataframe. Indexing
s_df.iloc[] - Refers only to the integer location and Slicing
s_df.at[] - Access a single value for a row/column label
pair Keep in mind
s_df.iat[] - Access a single value for a row/column pair The indexing attributes (.loc, .iloc, .ix, .at .iat) can be
by integer position. used to get and set values in the DataFrame.
The .loc, iloc and .ix indexing attributes can accept
python slice objects. But .at and .iat do not.
.loc can also accept Boolean Series arguments
Avoid chaining in the form df[col_indexer][row_indexer]
Label slices are inclusive, integer slices exclusive
Mott MacDonald 3 February 2022
Joins
Joins
Inner Join
The Inner join is one of the most common types of join we work with. It returns a dataframe with only those rows
that have common values.
Keep in mind
An inner join requires that each row in the two joined Inner Join
dataframes to have matching column values. We can
think as an intersection of two sets.
The Full Join, also called Full Outer Join, returns all records which either have a match in the left or right
dataframe.
Keep in mind
When the rows in both dataframes do not match, the Full Join
resulting dataframe will have NaN for every column of the
corresponding dataframe that misses a matching row.
The left join combines the columns on a common dimension, returning all rows from the first table with the
matching rows in the second table. The result is NULL in the second table when there is no match.
Keep in mind
All the non-matching rows of the left dataframe contain Left Join
NaN for the columns in the right dataframe. It is simply
an inner join plus all the non-matching rows of the left
dataframe filled with NaN for columns of the right
dataframe.
The Right join, also called as Right Outer Join, is similar to the Left Outer Join.
Keep in mind
The only difference is that all the rows of the right Right Join
dataframe are taken as it is and only those of the left
dataframe that are common in both
Users of relational databases like SQL are familiar with the terminology used to describe join operations between
two tables (DataFrame objects).
Keep in mind
Inner Join: To keep only rows that match from the data frames, specify the argument how=‘inner’.
Outer Join or Full outer join: To keep all rows from both data frames, specify how=‘outer’.
Left Join or Left outer join: To include all the rows of your data frame x and only those from y that match, specify
how=‘left’.
Right Join or Right outer join:To include all the rows of your data frame y and only those from x that match, specify
how=‘right’.
on− Columns (names) to join on. Must be found in both the left and right DataFrame objects.
how – type of join needs to be performed – ‘left’, ‘right’, ‘outer’, ‘inner’, Default is inner join
Keep in mind
Defaults to outer join with the option for inner join
.concat() is a pandas function
.concat() combines pandas DataFrame vertically or horizontally Practice 03
Errors when any of the DataFrame contains a duplicate index.
• Merge in Python
Keep in mind
We can convert an Object (string) Column to datetime
We can convert an integer Column to datetime
Convert Column to datetime when Reading a CSV File Practice 04
• Miscellaneous
Useful functions
findall: Returns a list containing all matches
search: Returns a Match object if there is a match anywhere in the string
split: Returns a list where the string has been split at each match Practice 05
sub: Replaces one or many matches with a string • Regular
expressions
Practice 06
• Miscellaneous
Thank you