Python Crash Course: Fintech Feb. 2023

FINTECH FEB.
2023
PYTHON
CRASH COURSE
TUTORIAL 4
AGENDA
PYTHON CRASH COURSE – TUTORIAL 4
• Pandas
• Dataset sources
• Exercise
2
PANDAS DataFrame
• Used when working with tabular Username Age Sex
data row/record 0 ‘John’ 24 ‘M’
• You can explore & clean the data 1 ‘Jane’ 32 ‘F’
using Pandas 2 ‘Sam’ 15 ‘M’
3 ‘Max’ 18 ‘M’
• A DataFrame is a table column
Index
3
Username Age Sex
PANDAS – CREATE 0 ‘John’ 24 ‘M’

1 ‘Jane’ 32 ‘F’
• Multiple ways to create a DataFrame 2 ‘Sam’ 15 ‘M’
• Creating from a dictionary 3 ‘Max’ 18 ‘M’
import pandas as pd
h = {
'Username': ['John', 'Jane', 'Sam', 'Max'],
'Age': [24, 32, 15, 18],
'Sex': ['M', 'F', 'M', 'M']
}
df = pd.DataFrame(h)
df
4
PANDAS – CREATE
• Assigning values to the index when creating the

DataFrame.
import pandas as pd
h = {
'Username': ['John', 'Jane', 'Sam', 'Max'],
'Age': [24, 32, 15, 18],
'Sex': ['M', 'F', 'M', 'M']
}

df = pd.DataFrame(h, index=['w','x', 'y', 'z'])
df
5
PANDAS – TYPES
• Each column has its datatype
• ‘object’ dtype, can hold any Python object, print(df.dtypes)
including strings.
• Columns with mixed types are stored with

the object dtype .
6
PANDAS
• Get shape
• Get column Names
print(df.shape) column_names = list(df.columns)
print(column_names)
7
PANDAS – SELECTION
• Select rows
• iloc
df.iloc[1:3]
8
• Select a column
• Multiple ways
print(df.Username) print(df['Username'])
9
• Select multiple columns
• Multiple ways
df[['Username', 'Age']] df.iloc[:, 0:2]
10
• Select a cell
• Multiple ways
print(df.Username[0]) print(df['Username'][0])
11
PANDAS – COUNT
• value_count()
df['Sex'].value_counts()
df['Sex'].value_counts()[0]
df['Sex'].value_counts()[1]
12
PANDAS – GROUP BY
• Many ways to work with groupby
df.groupby(['Student'])['Quiz_Mark'].mean()
13
PANDAS – CLEANING
• Missing values df = df.dropna()
• Duplicates df = df.drop_duplicates()
• Drop column df.drop(['Age'], axis = 1, inplace = True)
• Reset Index df.reset_index(drop=True)
• ‘inplace = True’ vs ‘inplace = False’

14
PANDAS – CONCAT
import pandas as pd
• Concatenate two DataFrames h1 = {
'Username': ['John', 'Jane', 'Sam'],
'Age': [24, 32, 15],
'Sex': ['M', 'F', 'M']
}
h2 = {
'Username': ['Harry', 'Rico', 'Anya'],
'Age': [10, 7, 9],
'Sex': ['M', 'M', 'F']
}
df1 = pd.DataFrame(h1)
df2 = pd.DataFrame(h2)
pd.concat([df1,df2], ignore_index=True)
15
PANDAS – READ
• Most of the time you will not be creating a DataFrame from scratch, instead
you will be working with already available data.
• Read csv file
• csv = comma separated values
• If needed, use the file’s full path
import pandas as pd
df = pd.read_csv('/content/legislators-historical.csv')
16
PANDAS – DISPLAY
• head  display the first 5 rows df.head()
df.tail()
• Tail  display the last 5 rows
• Below screenshot is the result after executing ‘df.head()’

17
PANDAS – DISPLAY
• When using head or tail, you can specify the number of rows you need to
display
df.head(6)
18
PANDAS – STATISTICS
• Descriptive statistics
df.describe()
19
PANDAS – STATISTICS
• You can transpose the output
df.describe().transpose()
20
PANDAS – EXPORT
• Export dataframe to an excel file or a csv file
df.to_excel('output1.xlsx')
df.to_csv('output2.csv') df.to_csv('output2.csv', index = False)
21
When exporting to csv use ‘index = False’ to prevent a column with index values from being created.
DATASET RESOURCES
• Kaggle Datasets
• UCI Machine Learning Repository
• datasetsearch.research.google.com
22
EXERCISE
Given the “legislators-historical.csv” file.
1. Display the first and last 8 records.
2. Display the shape of the DataFrame.
3. Generate descriptive statistics.
4. Check if there are missing values in the columns.
5. Check for duplicate rows.
Continued next slide …..

23
EXERCISE
5. Drop the ‘middle_name’ column.
6. Drop rows with missing values and drop duplicate rows.
7. Extract rows with people who are still alive.
8. Create a new column ‘age’ and deduce the age from the ‘birthyear’ column.
9. Drop the ‘birthyear’ and ‘alive’ columns.
10. Find the number of unique parties.
11. Find the percentage of males and females.
12. Export the DataFrame as a csv file, name the file ‘alive.csv’.
24
REFERENCES
• pandas.pydata.org/docs/getting_started/index.html
• www.kaggle.com/learn/pandas
• github.com/unitedstates/congress-legislators
25
THANK YOU

Python Crash Course: Fintech Feb. 2023

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python Crash Course: Fintech Feb. 2023

Uploaded by

Copyright:

Available Formats

FINTECH FEB.

• Used when working with tabular Username Age Sex

data row/record 0 ‘John’ 24 ‘M’

• You can explore & clean the data 1 ‘Jane’ 32 ‘F’

using Pandas 2 ‘Sam’ 15 ‘M’

• A DataFrame is a table column

PANDAS – CREATE 0 ‘John’ 24 ‘M’

• Multiple ways to create a DataFrame 2 ‘Sam’ 15 ‘M’

• Creating from a dictionary 3 ‘Max’ 18 ‘M’

• Assigning values to the index when creating the

• Each column has its datatype

• ‘object’ dtype, can hold any Python object, print(df.dtypes)

• Columns with mixed types are stored with

• Get column Names

• Select multiple columns

• Many ways to work with groupby

• Missing values df = df.dropna()

• Drop column df.drop(['Age'], axis = 1, inplace = True)

• Reset Index df.reset_index(drop=True)

• ‘inplace = True’ vs ‘inplace = False’

• Read csv file

• csv = comma separated values

• If needed, use the file’s full path

• head  display the first 5 rows df.head()

• Below screenshot is the result after executing ‘df.head()’

• You can transpose the output

• Export dataframe to an excel file or a csv file

• UCI Machine Learning Repository

Given the “legislators-historical.csv” file.

1. Display the first and last 8 records.

2. Display the shape of the DataFrame.

3. Generate descriptive statistics.

4. Check if there are missing values in the columns.

5. Check for duplicate rows.

Continued next slide …..

5. Drop the ‘middle_name’ column.

6. Drop rows with missing values and drop duplicate rows.

7. Extract rows with people who are still alive.

9. Drop the ‘birthyear’ and ‘alive’ columns.

10. Find the number of unique parties.

11. Find the percentage of males and females.

You might also like