75% found this document useful (4 votes)
789 views1 page

Python Data Science Cheatsheet

This document provides a summary of common functionality from Pandas, NumPy, and Scikit-Learn for data science basics. It covers topics such as importing and exploring data, cleaning data, filtering and grouping data, joining data, and writing data out. The full cheatsheet can be found online at elitedatascience.com.

Uploaded by

acutotu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
75% found this document useful (4 votes)
789 views1 page

Python Data Science Cheatsheet

This document provides a summary of common functionality from Pandas, NumPy, and Scikit-Learn for data science basics. It covers topics such as importing and exploring data, cleaning data, filtering and grouping data, joining data, and writing data out. The full cheatsheet can be found online at elitedatascience.com.

Uploaded by

acutotu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Python Cheatsheet:

Data Science Basics

In this cheat sheet, we summarize common and useful functionality from Pandas, NumPy, and Scikit-Learn. To see
the most up-to-date full version, visit the online cheatsheet at [Link].

SETUP Data Cleaning


First, make sure you have the following installed on your computer: [Link] = ['a','b','c']
[Link]()
• Python 2.7+ or Python 3
• Pandas [Link]()
• Jupyter Notebook (optional, but recommended)
[Link]()
*note: We strongly recommend installing the Anaconda Distribution, which
[Link](axis=1)
comes with all of those packages.
[Link](axis=1,thresh=n)
[Link](x)
Importing Data
[Link]([Link]())
pd.read_csv(filename)
[Link](float)
pd.read_table(filename)
[Link](1,'one')
pd.read_excel(filename)
[Link]([1,3],['one','three'])
pd.read_sql(query, connection_object)
[Link](columns=lambda x: x + 1)
pd.read_json(json_string)
[Link](columns={'old_name': 'new_ name'})
pd.read_html(url)
df.set_index('column_one')
pd.read_clipboard()
[Link](index=lambda x: x + 1)
[Link](dict)

Exploring Data Filter, Sort and Group By


df[df[col] > 0.5]
[Link]()
df[(df[col] > 0.5) & (df[col] < 0.7)]
[Link](n)
df.sort_values(col1)
[Link](n)
df.sort_values(col2,ascending=False)
[Link]()
df.sort_values([col1,col2], ascending=[True,False])
[Link]()
[Link](col)
s.value_counts(dropna=False)
[Link]([col1,col2])
[Link]([Link].value_counts)
[Link](col1)[col2].mean()
[Link]()
df.pivot_table(index=col1, values= col2,col3], aggfunc=mean)
[Link]()
[Link](col1).agg([Link])
[Link]()
[Link]([Link])
[Link]()
[Link]([Link], axis=1)
[Link]()
[Link]()
[Link]()
Joining and Combining
[Link](df2)
[Link]()
[Link]([df1, df2],axis=1)
[Link](df2,on=col1,how='inner')
Selecting
df[col]
df[[col1, col2]]
Writing Data
df.to_csv(filename)
[Link][0]
df.to_excel(filename)
[Link][0]
df.to_sql(table_name, connection_object)
[Link][0,:]
df.to_json(filename)
[Link][0,0]
df.to_html(filename)
df.to_clipboard()

[Link]

You might also like