You are on page 1of 26

FINTECH FEB.

2023

PYTHON
CRASH COURSE
TUTORIAL 4
AGENDA
PYTHON CRASH COURSE – TUTORIAL 4

• Pandas

• Dataset sources

• Exercise
2
PANDAS DataFrame
PYTHON CRASH COURSE – TUTORIAL 4

• Used when working with tabular Username Age Sex

data row/record 0 ‘John’ 24 ‘M’

• You can explore & clean the data 1 ‘Jane’ 32 ‘F’

using Pandas 2 ‘Sam’ 15 ‘M’

3 ‘Max’ 18 ‘M’

• A DataFrame is a table column

Index
3
Username Age Sex

PANDAS – CREATE 0 ‘John’ 24 ‘M’


PYTHON CRASH COURSE – TUTORIAL 4

1 ‘Jane’ 32 ‘F’

• Multiple ways to create a DataFrame 2 ‘Sam’ 15 ‘M’

• Creating from a dictionary 3 ‘Max’ 18 ‘M’

import pandas as pd

h = {
    'Username': ['John', 'Jane', 'Sam', 'Max'],
     'Age': [24, 32, 15, 18],
     'Sex': ['M', 'F', 'M', 'M']
     }

df = pd.DataFrame(h)
df
4
PANDAS – CREATE
PYTHON CRASH COURSE – TUTORIAL 4

• Assigning values to the index when creating the


DataFrame.

import pandas as pd

h = {
    'Username': ['John', 'Jane', 'Sam', 'Max'],
     'Age': [24, 32, 15, 18],
     'Sex': ['M', 'F', 'M', 'M']
     }
     
df = pd.DataFrame(h, index=['w','x', 'y', 'z'])
df
5
PANDAS – TYPES
PYTHON CRASH COURSE – TUTORIAL 4

• Each column has its datatype

• ‘object’ dtype, can hold any Python object, print(df.dtypes)

including strings.

• Columns with mixed types are stored with


the object dtype .
6
PANDAS
PYTHON CRASH COURSE – TUTORIAL 4

• Get shape

• Get column Names

print(df.shape) column_names = list(df.columns)
print(column_names)
7
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4

• Select rows

• iloc
df.iloc[1:3]
8
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4

• Select a column

• Multiple ways

print(df.Username) print(df['Username'])
9
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4

• Select multiple columns

• Multiple ways

df[['Username', 'Age']] df.iloc[:, 0:2]
10
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4

• Select a cell

• Multiple ways

print(df.Username[0]) print(df['Username'][0])
11
PANDAS – COUNT
PYTHON CRASH COURSE – TUTORIAL 4

• value_count()

df['Sex'].value_counts()

df['Sex'].value_counts()[0]

df['Sex'].value_counts()[1]
12
PANDAS – GROUP BY
PYTHON CRASH COURSE – TUTORIAL 4

• Many ways to work with groupby

df.groupby(['Student'])['Quiz_Mark'].mean()
13
PANDAS – CLEANING
PYTHON CRASH COURSE – TUTORIAL 4

• Missing values df = df.dropna()

• Duplicates df = df.drop_duplicates()

• Drop column df.drop(['Age'], axis = 1, inplace = True)

• Reset Index df.reset_index(drop=True)

• ‘inplace = True’ vs ‘inplace = False’


14
PANDAS – CONCAT
PYTHON CRASH COURSE – TUTORIAL 4

import pandas as pd
• Concatenate two DataFrames h1 = {
    'Username': ['John', 'Jane', 'Sam'],
     'Age': [24, 32, 15],
     'Sex': ['M', 'F', 'M']
     }

h2 = {
    'Username': ['Harry', 'Rico', 'Anya'],
     'Age': [10, 7, 9],
     'Sex': ['M', 'M', 'F']
     }

df1 = pd.DataFrame(h1)
df2 = pd.DataFrame(h2)
pd.concat([df1,df2], ignore_index=True)
15
PANDAS – READ
PYTHON CRASH COURSE – TUTORIAL 4

• Most of the time you will not be creating a DataFrame from scratch, instead
you will be working with already available data.

• Read csv file

• csv = comma separated values

• If needed, use the file’s full path

import pandas as pd

df = pd.read_csv('/content/legislators-historical.csv')
16
PANDAS – DISPLAY
PYTHON CRASH COURSE – TUTORIAL 4

• head  display the first 5 rows df.head()

df.tail()
• Tail  display the last 5 rows

• Below screenshot is the result after executing ‘df.head()’


17
PANDAS – DISPLAY
PYTHON CRASH COURSE – TUTORIAL 4

• When using head or tail, you can specify the number of rows you need to
display
df.head(6)
18
PANDAS – STATISTICS
PYTHON CRASH COURSE – TUTORIAL 4

• Descriptive statistics
df.describe()
19
PANDAS – STATISTICS
PYTHON CRASH COURSE – TUTORIAL 4

• You can transpose the output

df.describe().transpose()
20
PANDAS – EXPORT
PYTHON CRASH COURSE – TUTORIAL 4

• Export dataframe to an excel file or a csv file

df.to_excel('output1.xlsx')

df.to_csv('output2.csv') df.to_csv('output2.csv', index = False)
21

When exporting to csv use ‘index = False’ to prevent a column with index values from being created.
DATASET RESOURCES
PYTHON CRASH COURSE – TUTORIAL 4

• Kaggle Datasets

• UCI Machine Learning Repository

• datasetsearch.research.google.com
22
EXERCISE
PYTHON CRASH COURSE – TUTORIAL 4

Given the “legislators-historical.csv” file.

1. Display the first and last 8 records.

2. Display the shape of the DataFrame.

3. Generate descriptive statistics.

4. Check if there are missing values in the columns.

5. Check for duplicate rows.

Continued next slide …..


23
EXERCISE
PYTHON CRASH COURSE – TUTORIAL 4

5. Drop the ‘middle_name’ column.

6. Drop rows with missing values and drop duplicate rows.

7. Extract rows with people who are still alive.

8. Create a new column ‘age’ and deduce the age from the ‘birthyear’ column.

9. Drop the ‘birthyear’ and ‘alive’ columns.

10. Find the number of unique parties.

11. Find the percentage of males and females.

12. Export the DataFrame as a csv file, name the file ‘alive.csv’.
24
REFERENCES
PYTHON CRASH COURSE – TUTORIAL 4

• pandas.pydata.org/docs/getting_started/index.html

• www.kaggle.com/learn/pandas

• github.com/unitedstates/congress-legislators
25
THANK YOU

You might also like