You are on page 1of 22

Basic Data Manipulation with Pandas

Basic Operation

• Selection
• Filtering
• Addition
• Deletion
• Rename
• Sorting
Data Selection
Selection by Column
Data Selection
Selection by Row
.loc vs .iloc
Data Filtering
Conditional Selection
We can combine multiple conditions using (&) operator to select rows from a pandas dataframe.
Data Addition
Column Addition

If you want to combine new column from another dataframe, you can use .merge() function
Data Addition
Row Addition
If the data contain numerous columns, it’s not advisable to add it manually, instead you can
use .concat() function
Data Deletion
Column Deletion

To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Optionally, if you
want to delete permanently, you can specify the inplace equal to True
Data Rename

Rename by mapping old names to new names using a dictionary, with form {“old_column_name”:
“new_column_name”, …}. Optionally, if you want to rename permanently, you can specify the
inplace equal to True
Data Rename

Rename by providing a function to change the column names with. Functions are applied to every
column name
Data Sorting
Pandas sort_values() function sorts a data frame in Ascending or Descending order of passed Column
Pandas Functionality
Pandas Functionality

• Basic Function
• Data Engineering
Basic Function
Head
Use head(n) to return the first n records
df.head(7) #will return first 7 records
Basic Function
Tail
Use tail(n) to return the last n records
df.tail(5) #will return last 5 records
Basic Function
There are also key attributes of a Data Frame such as:
shape — shows dimensionality of the DataFrame
columns — shows the Data Frame column name
Basic Function
Describe
If you want to see a quick summary of data frame and our data want to be informed of its count, mean,
standard deviation, minimum, maximum and a number of percentiles for each of the columns in the data
frame then use the describe method:
df.describe()
Basic Function
DataFrame also offers a number of statistic
functions such as:
● mean() — Mean values. It also offers median(), mode()
● min() — minimum value. It also offers max().count(), std() — standard deviation
Data Engineering
Missing Values Checking
.isnull() -will return the the data frame that has a NaN values
.notnull() - will return the data frame that doesn’t have NaN values
Data Engineering
Dropping and Filling the Missing Values:
.dropna - to drop the missing values, specify axis = 1 to operate on the column wise, specify inplace
equal to True to permanently do this
.fillna - to fill the missing values, specify axis = 1 to operate on the column wise, specify inplace equal to
True to permanently do this
Data Engineering
Computing Unique Values
.unique- to get the unique values of the column
.value_counts- to get the counts of unique values

You might also like