You are on page 1of 1

Synopsis

This notebook will presetns an overivew of plotting data and more importantly the benefits of visualizing data
and the pitfalls of not visualizing data.

It uses the most common plotting packages used by python, matplotlib

The data it uses is called Anscombe's quarter

developed by Frank Ascombe


designed to show the importance of statitical graphs
contains 4 sets of data
each set contains 2 continuous variables
each set has the same mean, variance, correlation and regression

Only when the data is visualized deos it become apparent that each dataset does not follow the same pattern.

In [ ]: import pandas as pd

# format for floats

pd.options.display.float_format = '{:,.2f}'.format

df = pd.read_excel(io="../Data/plotting_data.xls", sheet_name='anscombe', index_col='ID')

Matplotlib
%matplotlib inline is a magic command.

It means when plotting matplotlib charts, embed them directly into the notebook

In [ ]: import matplotlib.pyplot as plt

%matplotlib inline

In [ ]: # plot only dataset 'I'

fltr = df['dataset'] == 'I'

df_I = df[fltr]

# extract the x and y values

xI = df_I['x']

yI = df_I['y']

# plot

plt.plot(xI, yI)

plt.plot(xI, yI, 'o')

Plot all 4 datasets on a single figure


Extract the 4 datasets from the dataframe
Create a figure & add 4 sub plots to the figure
Plot each dataset on its corresponding sub_plot
Add some titles etc

In [ ]: # Extract the 4 datasets from the dataframe

You might also like