Professional Documents
Culture Documents
Data Analysis Progress
Data Analysis Progress
Introduction to Pandas
Importing data
Series and DataFrame objects
Indexing, data selection and subsetting
Hierarchical indexing
Reading and writing files
Sorting and ranking
Missing data
Data summarization
Data Wrangling with Pandas
Date/time types
Merging and joining DataFrame objects
Concatenation
Reshaping DataFrame objects
Pivoting
Data transformation
Permutation and sampling
Data aggregation and GroupBy operations
Plotting and Visualization
https://bigdata-madesimple.com/step-by-step-approach-to-perform-data-analysis-
using-python/
Plotting in Pandas vs Matplotlib
Bar plots
Histograms
Box plots
Grouped plots
Scatterplots
Trellis plots
Statistical Data Modeling
Statistical modeling
Fitting data to probability distributions
Fitting regression models
Model selection
Bootstrapping
Required Packages
Python 2.7 or higher (including Python 3)
pandas >= 0.11.1 and its dependencies
NumPy >= 1.6.1
matplotlib >= 1.0.0
pytz
IPython >= 0.12
pyzmq
tornado
Optional: statsmodels, xlrd and openpyxl
For students running the latest version of Mac OS X (10.8), the easiest way to
obtain all the packages is to install the Scipy Superpack which works with Python
2.7.2 that ships with OS X.
Otherwise, another easy way to install all the necessary packages is to use
Continuum Analytics' Anaconda.