You are on page 1of 5

Python

Cheat Sheet

Python | Pandas
Data Analysis
Data Visualization
by Frank Andrade
Python Basics Variables
Variable assignment:
Creating a new list:

numbers = [4, 3, 10, 7, 1, 2]

Cheat Sheet
message_1 = "I'm learning Python" Sorting a list:
message_2 = "and it's fun!" >>> numbers.sort()

[1, 2, 3, 4, 7, 10]
Here you will find all the Python core concepts you need to String concatenation (+ operator):

message_1 + ' ' + message_2 >>> numbers.sort(reverse=True)


know before learning any third-party library.
[10, 7, 4, 3, 2, 1]
String concatenation (f-string):
f'{message_1} {message_2}'
Data Types

Update value on a list:


>>> numbers[0] = 1000
Integers (int): 1 >>> numbers
Float (float): 1.2
List [1000, 7, 4, 3, 2, 1]
String (str): "Hello World" Creating a list:

Copying a list:
Boolean: True/False countries = ['United States', 'India', new_list = countries[:]
'China', 'Brazil'] new_list_2 = countries.copy()
List: [value1, value2]

Dictionary: {key1:value1, key2:value2, ...}

Create an empty list:

my_list = [] Built-in Functions


Numeric Operators Comparison Operators Indexing: Print an object:
>>> countries[0] print("Hello World")
+ Addition
== United States


Equal to


Return the length of x:
- Subtraction >>> countries[3] len(x)

!= Different Brazil

Multiplication
Return the minimum value:
*

> Greater than >>> countries[-1] min(x)
Division
Brazil

/ < Less than


Return the maximum value:

Slicing:
Exponent
max(x)
** >= Greater than or equal to >>>countries[0:3]


['United States', 'India', 'China']

Returns a sequence of numbers:
% Modulus range(x1,x2,n) # from x1 to x2
<= Less than or equal to


>>>countries[1:] (increments by n)
// Floor division ['India', 'China', 'Brazil']

Convert x to a string:
>>>countries[:2] str(x)
['United States', 'India']

String methods
Convert x to an integer/float:
Adding elements to a list: int(x)
string.upper(): converts to uppercase countries.append('Canada') float(x)
string.lower(): converts to lowercase countries.insert(0,'Canada')

string.title(): converts to title case Convert x to a list:


Nested list: list(x)
string.count('l'): counts how many times "l" nested_list = [countries, countries_2]

appears

string.find('h'): position of the "h" first Remove element:


countries.remove('United States')
ocurrance countries.pop(0)#removes and returns value
string.replace('o', 'u'): replaces "o" with "u" del countries[0]
Dictionary If Statement Functions
Creating a dictionary: Create a function:
Conditional test:
my_data = {'name':'Frank', 'age':26} def function(<params>):
if <condition>:

<code> <code>
Create an empty dictionary: elif <condition>: return <data>
my_dict = {} <code>

...
Get value of key "name": else:
Modules
>>> my_data["name"] <code> Import module:
'Frank'
import module

Example: module.method()
Get the keys: if age>=18:

>>> my_data.keys() print("You're an adult!") OS module:


dict_keys(['name', 'age'])
import os

Conditional test with list: os.getcwd()


Get the values: if <value> in <list>: os.listdir()
>>> my_data.values() <code> os.makedirs(<path>)
dict_values(['Frank', 26])

Get the pair key-value:


>>> my_data.items()
Loops Special Characters
dict_items([('name', 'Frank'), ('age', 26)]) For loop: # Comment

for <variable> in <list>:

Adding/updating items in a dictionary: <code> \n New Line


my_data['height']=1.7
my_data.update({'height':1.8, For loop and enumerate list elements:
'languages':['English', 'Spanish']}) for i, element in enumerate(<list>): Boolean Operators Boolean Operators
>>> my_data <code> (Pandas)
{'name': 'Frank',

'age': 26, For loop and obtain dictionary elements: and logical AND & logical AND
'height': 1.8, for key, value in my_dict.items():

'languages': ['English', 'Spanish']} <code> or logical OR | logical OR

Remove an item: While loop: not logical NOT ~ logical NOT


my_data.pop('height') while <condition>:
del my_data['languages'] <code>
my_data.clear()

Copying a dictionary: Data Validation


new_dict = my_data.copy()

Try-except:
try:
<code> Below there are my guides, tutorials
except <error>:
<code> and complete Python courses:
- Medium Guides
Loop control statement: - YouTube Tutorials
break: stops loop execution
continue: jumps to next iteration - Udemy Courses
pass: does nothing
Made by Frank Andrade frank-andrade.medium.com
Pandas Selecting rows and columns Merge multiple data frames horizontally:
df3 = pd.DataFrame([[1, 7],[8,9]],

Cheat Sheet
Select single column: index=['B', 'D'],
df['col1'] columns=['col1', 'col3'])

#df3: new dataframe
Select multiple columns: Only merge complete rows (INNER JOIN):
Pandas provides data analysis tools for Python. All of the df[['col1', 'col2']] df.merge(df3)
following code examples refer to the dataframe below.

Show first n rows: Left column stays complete (LEFT OUTER JOIN):
df.head(2) df.merge(df3, how='left')
axis 1

col1 col2 Show last n rows: Right column stays complete (RIGHT OUTER JOIN):
df.tail(2) df.merge(df3, how='right')
A 1 4

Select rows by index values: Preserve all values (OUTER JOIN):


axis 0
df = B 2 5

df.loc['A'] df.loc[['A', 'B']]


df.merge(df3, how='outer')

C 3 6 Select rows by position: Merge rows by index:


df.loc[1] df.loc[1:] df.merge(df3,left_index=True,

right_index=True)

Getting Started Data wrangling Fill NaN values:


df.fillna(0)
Import pandas: Filter by value:

import pandas as pd df[df['col1'] > 1] Apply your own function:



def func(x):
Sort by one column: return 2**x
Create a series: df.sort_values('col1') df.apply(func)
s = pd.Series([1, 2, 3],

Sort by columns:
index=['A', 'B', 'C'], df.sort_values(['col1', 'col2'], Arithmetics and statistics
name='col1') ascending=[False, True])

Add to all values:
Create a dataframe:
Identify duplicate rows: df + 10
data = [[1, 4], [2, 5], [3, 6]] df.duplicated()

index = ['A', 'B', 'C']


Sum over columns:
df = pd.DataFrame(data, index=index, Identify unique rows: df.sum()
df['col1'].unique()

columns=['col1', 'col2'])
Cumulative sum over columns:
Read a csv file with pandas: Swap rows and columns: df.cumsum()
df = pd.read_csv('filename.csv') df = df.transpose()

df = df.T Mean over columns:



df.mean()
Advanced parameters: Drop a column:

df = pd.read_csv('filename.csv', sep=',', df = df.drop('col1', axis=1) Standard deviation over columns:



df.std()
names=['col1', 'col2'], Clone a data frame:

index_col=0, clone = df.copy() Count unique values:


encoding='utf-8',
df['col1'].value_counts()
Connect multiple data frames vertically:

nrows=3) df2 = df + 5 #new dataframe Summarize descriptive statistics:


pd.concat([df,df2]) df.describe()

Hierarchical indexing Data export Visualization


Create hierarchical index: Data as NumPy array: The plots below are made with a dataframe
df.stack() df.values with the shape of df_gdp (pivot() method)


Dissolve hierarchical index: Save data as CSV file:


df.unstack() df.to_csv('output.csv', sep=",") Import matplotlib:


import matplotlib.pyplot as plt
Format a dataframe as tabular string:

Aggregation

df.to_string() Start a new diagram:


plt.figure()
Create group object: Convert a dataframe to a dictionary:

g = df.groupby('col1') df.to_dict() Scatter plot:




df.plot(kind='scatter')
Iterate over groups: Save a dataframe as an Excel table:

for i, group in g: df.to_excel('output.xlsx') Bar plot:


print(i, group)
df.plot(kind='bar',


xlabel='data1',
Aggregate groups: ylabel='data2')
g.sum()
g.prod()
Pivot and Pivot Table

Lineplot:
g.mean() Read csv file 1: df.plot(kind='line',
g.std() df_gdp = pd.read_csv('gdp.csv') figsize=(8,4))
g.describe()


The pivot() method: Boxplot:
Select columns from groups: df_gdp.pivot(index="year", df['col1'].plot(kind='box')
g['col2'].sum() columns="country",

g[['col2', 'col3']].sum() values="gdppc") Histogram over one column:




df['col1'].plot(kind='hist',
Transform values: Read csv file 2: bins=3)
import math df_sales=pd.read_excel(

g.transform(math.log) 'supermarket_sales.xlsx') Piechart:




df.plot(kind='pie',
Apply a list function on each group: Make pivot table: y='col1',
def strsum(group): df_sales.pivot_table(index='Gender', title='Population')
return ''.join([str(x) for x in group.value]) aggfunc='sum')



Set tick marks:
g['col2'].apply(strsum) Make a pivot tables that says how much male and labels = ['A', 'B', 'C', 'D']
female spend in each category: positions = [1, 2, 3, 4]
plt.xticks(positions, labels)
df_sales.pivot_table(index='Gender', plt.yticks(positions, labels)
columns='Product line',

values='Total', Label diagram and axes:


Below there are my guides, tutorials plt.title('Correlation')
aggfunc='sum')
and complete Python courses:
plt.xlabel('Nunstück')
- Medium Guides plt.ylabel('Slotermeyer')

- YouTube Tutorials Save most recent diagram:


- Udemy Courses plt.savefig('plot.png')
plt.savefig('plot.png',dpi=300)
Made by Frank Andrade frank-andrade.medium.com plt.savefig('plot.svg')

You might also like