cheat sheet Data as NumPy array: positions = [1.0, 2.0, 3.0, 4.0] df.values plt.xticks(positions, labels) plt.yticks(positions, labels) Save data as CSV file: df.to_csv('output.csv', sep=",") Select area to plot: All of the following code examples refer to this table: plt.axis([0.0, 2.5, 0.0, 10.0]) Format a data frame as tabular string: # [from x, to x, from y, to y] df= df.to_string() col1 col2 Label diagram and axes: Convert a data frame to a dictionary: plt.title('Correlation') A 1 4 df.to_dict() plt.xlabel('Nunstück') plt.ylabel('Slotermeyer') B 2 5 Save a data frame as an Excel table: df.to_excel('output.xlsx') Save most recent diagram: C 3 6 plt.savefig('plot.png') (requires package xlwt) plt.savefig('plot.png', dpi=300) plt.savefig('plot.svg') Getting started Visualization Import pandas: import pandas as pd Import matplotlib: import pylab as plt Create a series: s = pd.Series([1, 2, 3], index=['A', 'B', 'C'], Start a new diagram: name='col1') plt.figure() Create a data frame: Scatter plot: data = [[1, 4], [2, 5], [3, 6]] df.plot.scatter('col1', 'col2', style='ro') index = ['A', 'B', 'C'] df = pd.DataFrame(data, index=index, Bar plot: columns=['col1', 'col2']) df.plot.bar(x='col1', y='col2', width=0.7) Load a data frame: Area plot: df = pd.read_csv('filename.csv', df.plot.area(stacked=True, alpha=1.0) sep=',', names=['col1', 'col2'], Box-and-whisker plot: index_col=0, df.plot.box() encoding='utf-8', Text by Kristian Rother, Thomas Lotze (CC-BY-SA 4.0) nrows=3) Histogram over one column: df['col1'].plot.hist(bins=3) https://www.cusy.io/de/seminare Selecting rows and columns Histogram over all columns: df.plot.hist(bins=3, alpha=0.5) Select single column: df['col1'] Select multiple columns: Merge multiple data frames horizontally: Count unique values: df[['col1', 'col2']] df3 = pd.DataFrame([[1, 7], [8, 9]], df['col1'].value_counts() index=['B', 'D'], Show first n rows: columns=['col1', 'col3']) Summarize descriptive statistics: df.head(2) df.describe() Only merge complete rows (INNER JOIN): Show last n rows: df.merge(df3) df.tail(2) Hierarchical indexing Left column stays complete (LEFT OUTER JOIN): Select rows by index values: df.merge(df3, how='left') Create hierarchical index: df.loc['A'] df.stack() df.loc[['A', 'B']] Right column stays complete (RIGHT OUTER JOIN): df.merge(df3, how='right') Dissolve hierarchical index: Select rows by position: df.unstack() df.loc[1] Preserve all values (OUTER JOIN): df.loc[1:] df.merge(df3, how='outer') Aggregation Merge rows by index: Data wrangling df.merge(df3, left_index=True, right_index=True) Create group object: g = df.groupby('col1') Filter by value: Fill NaN values: df[df['col1'] > 1] df.fillna(0.0) Iterate over groups: for i, group in g: Sort by columns: Apply your own function: print(i, group) df.sort_values(['col2', 'col2'], ascending=[False, True]) def func(x): return 2**x df.apply(func) Aggregate groups: Identify duplicate rows: g.sum() df.duplicated() g.prod() Arithmetics and statistics g.mean() Identify unique rows: g.std() df['col1'].unique() Add to all values: g.describe() df + 10 Swap rows and columns: Select columns from groups: df = df.transpose() Sum over columns: g['col2'].sum() df.sum() g[['col2', 'col3']].sum() Remove a column: del df['col2'] Cumulative sum over columns: Transform values: df.cumsum() import math Clone a data frame: g.transform(math.log) clone = df.copy() Mean over columns: df.mean() Apply a list function on each group: Connect multiple data frames vertically: def strsum(group): df2 = df + 10 Standard devieation over columns: return ''.join([str(x) for x in group.values]) pd.concat([df, df2]) df.std() g['col2'].apply(strsum)