You are on page 1of 2

Important libraries:

• Numpy – arrays

• Matplotlib – data analysis

• Pandas – dataframe manipulation

• String – strings manipulation

• Wget – web scrapping tool

• Qgrid – dataframe visualization

• Zfile – zip file manipulation

• Investpy – investing data harvesting

• Ipywidget – interaction with graphs – (Interact)

F’string – It’s a way of getting dictionaries faster less verbose. Ex: k = “Allan” / f’{k} is a genius’

If it’s needed to use multiple quotes in the sentence, it should be considered to use different quote mark
for f string. Ex: f”{k} told ‘Fuck you’ to the teacher”.

Creating lists with for loop – Can be done by using the command append() or concat(), a for loop.

Arq = []

for i in range(2011,2021):

arq.append(f’qualquer_nome_{i}’, columns=[“arquivo”]) – Create a dataframe and store


qualquernome_nome_{i(ano)} into the column “arquivo”

qgrid.show_grid(dataframe) – opens the dataframe for visualization with grids and filters.

Sorting items

Dataframe.set_index([‘column A’],[‘column B’]) – applies indices to the dataframe

df.sort_values(by=[‘Column_A’,‘Column_B’])

df[Column_A']=df['Column_A'].map("{:,}".format) – this section puts comma on every thousand of the


dataframe data.

df.T - returns the transport of df

Filtering

Simple filtering

df = df[df['Column_A’] == 'filter'] – Will return a dataframe with data where there Will be the string
‘filter’ on ‘column A’
df1 = df["Column_A"].str.contains("Filter") – Will return a dataframe with Boolean check whether the
rows of ‘column A’ contains or not the string ‘Filter’

by calling df[df1] it will apply the filter method ‘df1’ to ‘df’.

Data Analysis

Dataframe.agg({“column_A”: ["min","max","mean","median", "skew"]}) – will return aggregating


function for column A

df.describe() - returns statistics

Data Manipulation

df.shape - shows the dimension of the dataframe you're looking at.

replace() - it's not a string method. It is used to replace multiples elements in the dataframe. Ex:
titanic["Sex_short"] = titanic["Sex"].replace("Male": "M", "Female": "F") – It’ll create a column named
“Sex_Short”, copy the values from “Sex” and replace them with short for male and female.

pd.to_numeric(dataframe[“column_A”]) – mutate the data of the Column A into numeric

Resample() - re-organize data frequency

Query – Allows one to search in the dataframe based on conditions. Ex: df.query( ‘a > b’)

The query sentence must be entered inside quote marks. For columns with spaces in the name it must
be entered with backtick ` ` . Ex: df.query( ‘ `Col Ex` == “Improving”’) . The strings must be entered with
doble quote marks.

You might also like