Professional Documents
Culture Documents
Python Crash Course: Fintech Feb. 2023
Python Crash Course: Fintech Feb. 2023
2023
PYTHON
CRASH COURSE
TUTORIAL 4
AGENDA
PYTHON CRASH COURSE – TUTORIAL 4
• Pandas
• Dataset sources
• Exercise
2
PANDAS DataFrame
PYTHON CRASH COURSE – TUTORIAL 4
3 ‘Max’ 18 ‘M’
Index
3
Username Age Sex
1 ‘Jane’ 32 ‘F’
import pandas as pd
h = {
'Username': ['John', 'Jane', 'Sam', 'Max'],
'Age': [24, 32, 15, 18],
'Sex': ['M', 'F', 'M', 'M']
}
df = pd.DataFrame(h)
df
4
PANDAS – CREATE
PYTHON CRASH COURSE – TUTORIAL 4
import pandas as pd
h = {
'Username': ['John', 'Jane', 'Sam', 'Max'],
'Age': [24, 32, 15, 18],
'Sex': ['M', 'F', 'M', 'M']
}
df = pd.DataFrame(h, index=['w','x', 'y', 'z'])
df
5
PANDAS – TYPES
PYTHON CRASH COURSE – TUTORIAL 4
including strings.
• Get shape
print(df.shape) column_names = list(df.columns)
print(column_names)
7
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4
• Select rows
• iloc
df.iloc[1:3]
8
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4
• Select a column
• Multiple ways
print(df.Username) print(df['Username'])
9
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4
• Multiple ways
df[['Username', 'Age']] df.iloc[:, 0:2]
10
PANDAS – SELECTION
PYTHON CRASH COURSE – TUTORIAL 4
• Select a cell
• Multiple ways
print(df.Username[0]) print(df['Username'][0])
11
PANDAS – COUNT
PYTHON CRASH COURSE – TUTORIAL 4
• value_count()
df['Sex'].value_counts()
df['Sex'].value_counts()[0]
df['Sex'].value_counts()[1]
12
PANDAS – GROUP BY
PYTHON CRASH COURSE – TUTORIAL 4
df.groupby(['Student'])['Quiz_Mark'].mean()
13
PANDAS – CLEANING
PYTHON CRASH COURSE – TUTORIAL 4
• Duplicates df = df.drop_duplicates()
import pandas as pd
• Concatenate two DataFrames h1 = {
'Username': ['John', 'Jane', 'Sam'],
'Age': [24, 32, 15],
'Sex': ['M', 'F', 'M']
}
h2 = {
'Username': ['Harry', 'Rico', 'Anya'],
'Age': [10, 7, 9],
'Sex': ['M', 'M', 'F']
}
df1 = pd.DataFrame(h1)
df2 = pd.DataFrame(h2)
pd.concat([df1,df2], ignore_index=True)
15
PANDAS – READ
PYTHON CRASH COURSE – TUTORIAL 4
• Most of the time you will not be creating a DataFrame from scratch, instead
you will be working with already available data.
import pandas as pd
df = pd.read_csv('/content/legislators-historical.csv')
16
PANDAS – DISPLAY
PYTHON CRASH COURSE – TUTORIAL 4
df.tail()
• Tail display the last 5 rows
• When using head or tail, you can specify the number of rows you need to
display
df.head(6)
18
PANDAS – STATISTICS
PYTHON CRASH COURSE – TUTORIAL 4
• Descriptive statistics
df.describe()
19
PANDAS – STATISTICS
PYTHON CRASH COURSE – TUTORIAL 4
df.describe().transpose()
20
PANDAS – EXPORT
PYTHON CRASH COURSE – TUTORIAL 4
df.to_excel('output1.xlsx')
df.to_csv('output2.csv') df.to_csv('output2.csv', index = False)
21
When exporting to csv use ‘index = False’ to prevent a column with index values from being created.
DATASET RESOURCES
PYTHON CRASH COURSE – TUTORIAL 4
• Kaggle Datasets
• datasetsearch.research.google.com
22
EXERCISE
PYTHON CRASH COURSE – TUTORIAL 4
8. Create a new column ‘age’ and deduce the age from the ‘birthyear’ column.
12. Export the DataFrame as a csv file, name the file ‘alive.csv’.
24
REFERENCES
PYTHON CRASH COURSE – TUTORIAL 4
• pandas.pydata.org/docs/getting_started/index.html
• www.kaggle.com/learn/pandas
• github.com/unitedstates/congress-legislators
25
THANK YOU