You are on page 1of 2

Data export Set tick marks:

labels = ['A', 'B', 'C', 'D'] Pandas


cheat sheet
Data as NumPy array: positions = [1.0, 2.0, 3.0, 4.0]
df.values plt.xticks(positions, labels)
plt.yticks(positions, labels)
Save data as CSV file:
df.to_csv('output.csv', sep=",") Select area to plot: All of the following code examples refer to this table:
plt.axis([0.0, 2.5, 0.0, 10.0])
Format a data frame as tabular string: # [from x, to x, from y, to y] df=
df.to_string() col1 col2
Label diagram and axes:
Convert a data frame to a dictionary: plt.title('Correlation') A 1 4
df.to_dict() plt.xlabel('Nunstück')
plt.ylabel('Slotermeyer') B 2 5
Save a data frame as an Excel table:
df.to_excel('output.xlsx') Save most recent diagram: C 3 6
plt.savefig('plot.png')
(requires package xlwt) plt.savefig('plot.png', dpi=300)
plt.savefig('plot.svg') Getting started
Visualization Import pandas:
import pandas as pd
Import matplotlib:
import pylab as plt Create a series:
s = pd.Series([1, 2, 3], index=['A', 'B', 'C'],
Start a new diagram: name='col1')
plt.figure()
Create a data frame:
Scatter plot: data = [[1, 4], [2, 5], [3, 6]]
df.plot.scatter('col1', 'col2', style='ro') index = ['A', 'B', 'C']
df = pd.DataFrame(data, index=index,
Bar plot: columns=['col1', 'col2'])
df.plot.bar(x='col1', y='col2', width=0.7)
Load a data frame:
Area plot: df = pd.read_csv('filename.csv',
df.plot.area(stacked=True, alpha=1.0) sep=',',
names=['col1', 'col2'],
Box-and-whisker plot: index_col=0,
df.plot.box() encoding='utf-8',
Text by Kristian Rother, Thomas Lotze (CC-BY-SA 4.0) nrows=3)
Histogram over one column:
df['col1'].plot.hist(bins=3) https://www.cusy.io/de/seminare
Selecting rows and columns
Histogram over all columns:
df.plot.hist(bins=3, alpha=0.5) Select single column:
df['col1']
Select multiple columns: Merge multiple data frames horizontally: Count unique values:
df[['col1', 'col2']] df3 = pd.DataFrame([[1, 7], [8, 9]], df['col1'].value_counts()
index=['B', 'D'],
Show first n rows: columns=['col1', 'col3']) Summarize descriptive statistics:
df.head(2) df.describe()
Only merge complete rows (INNER JOIN):
Show last n rows: df.merge(df3)
df.tail(2) Hierarchical indexing
Left column stays complete (LEFT OUTER JOIN):
Select rows by index values: df.merge(df3, how='left') Create hierarchical index:
df.loc['A'] df.stack()
df.loc[['A', 'B']] Right column stays complete (RIGHT OUTER JOIN):
df.merge(df3, how='right') Dissolve hierarchical index:
Select rows by position: df.unstack()
df.loc[1] Preserve all values (OUTER JOIN):
df.loc[1:] df.merge(df3, how='outer')
Aggregation
Merge rows by index:
Data wrangling df.merge(df3, left_index=True, right_index=True) Create group object:
g = df.groupby('col1')
Filter by value: Fill NaN values:
df[df['col1'] > 1] df.fillna(0.0) Iterate over groups:
for i, group in g:
Sort by columns: Apply your own function: print(i, group)
df.sort_values(['col2', 'col2'], ascending=[False, True]) def func(x): return 2**x
df.apply(func) Aggregate groups:
Identify duplicate rows: g.sum()
df.duplicated() g.prod()
Arithmetics and statistics g.mean()
Identify unique rows: g.std()
df['col1'].unique() Add to all values: g.describe()
df + 10
Swap rows and columns: Select columns from groups:
df = df.transpose() Sum over columns: g['col2'].sum()
df.sum() g[['col2', 'col3']].sum()
Remove a column:
del df['col2'] Cumulative sum over columns: Transform values:
df.cumsum() import math
Clone a data frame: g.transform(math.log)
clone = df.copy() Mean over columns:
df.mean() Apply a list function on each group:
Connect multiple data frames vertically: def strsum(group):
df2 = df + 10 Standard devieation over columns: return ''.join([str(x) for x in group.values])
pd.concat([df, df2]) df.std() g['col2'].apply(strsum)

You might also like