You are on page 1of 1

Outline

Introduction to Pandas

Importing data
Series and DataFrame objects
Indexing, data selection and subsetting
Hierarchical indexing
Reading and writing files
Sorting and ranking
Missing data
Data summarization
Data Wrangling with Pandas

Date/time types
Merging and joining DataFrame objects
Concatenation
Reshaping DataFrame objects
Pivoting
Data transformation
Permutation and sampling
Data aggregation and GroupBy operations
Plotting and Visualization

https://bigdata-madesimple.com/step-by-step-approach-to-perform-data-analysis-
using-python/
Plotting in Pandas vs Matplotlib
Bar plots
Histograms
Box plots
Grouped plots
Scatterplots
Trellis plots
Statistical Data Modeling

Statistical modeling
Fitting data to probability distributions
Fitting regression models
Model selection
Bootstrapping
Required Packages
Python 2.7 or higher (including Python 3)
pandas >= 0.11.1 and its dependencies
NumPy >= 1.6.1
matplotlib >= 1.0.0
pytz
IPython >= 0.12
pyzmq
tornado
Optional: statsmodels, xlrd and openpyxl

For students running the latest version of Mac OS X (10.8), the easiest way to
obtain all the packages is to install the Scipy Superpack which works with Python
2.7.2 that ships with OS X.

Otherwise, another easy way to install all the necessary packages is to use
Continuum Analytics' Anaconda.

You might also like