Python Cheatsheet:
Data Science Basics
In this cheat sheet, we summarize common and useful functionality from Pandas, NumPy, and Scikit-Learn. To see
the most up-to-date full version, visit the online cheatsheet at [Link].
SETUP Data Cleaning
First, make sure you have the following installed on your computer: [Link] = ['a','b','c']
[Link]()
• Python 2.7+ or Python 3
• Pandas [Link]()
• Jupyter Notebook (optional, but recommended)
[Link]()
*note: We strongly recommend installing the Anaconda Distribution, which
[Link](axis=1)
comes with all of those packages.
[Link](axis=1,thresh=n)
[Link](x)
Importing Data
[Link]([Link]())
pd.read_csv(filename)
[Link](float)
pd.read_table(filename)
[Link](1,'one')
pd.read_excel(filename)
[Link]([1,3],['one','three'])
pd.read_sql(query, connection_object)
[Link](columns=lambda x: x + 1)
pd.read_json(json_string)
[Link](columns={'old_name': 'new_ name'})
pd.read_html(url)
df.set_index('column_one')
pd.read_clipboard()
[Link](index=lambda x: x + 1)
[Link](dict)
Exploring Data Filter, Sort and Group By
df[df[col] > 0.5]
[Link]()
df[(df[col] > 0.5) & (df[col] < 0.7)]
[Link](n)
df.sort_values(col1)
[Link](n)
df.sort_values(col2,ascending=False)
[Link]()
df.sort_values([col1,col2], ascending=[True,False])
[Link]()
[Link](col)
s.value_counts(dropna=False)
[Link]([col1,col2])
[Link]([Link].value_counts)
[Link](col1)[col2].mean()
[Link]()
df.pivot_table(index=col1, values= col2,col3], aggfunc=mean)
[Link]()
[Link](col1).agg([Link])
[Link]()
[Link]([Link])
[Link]()
[Link]([Link], axis=1)
[Link]()
[Link]()
[Link]()
Joining and Combining
[Link](df2)
[Link]()
[Link]([df1, df2],axis=1)
[Link](df2,on=col1,how='inner')
Selecting
df[col]
df[[col1, col2]]
Writing Data
df.to_csv(filename)
[Link][0]
df.to_excel(filename)
[Link][0]
df.to_sql(table_name, connection_object)
[Link][0,:]
df.to_json(filename)
[Link][0,0]
df.to_html(filename)
df.to_clipboard()
[Link]