Professional Documents
Culture Documents
Seaborn
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What is Seaborn?
Python data visualization library
Erin Case
Data Scientist
What is pandas?
Python library for data analysis
Easily read datasets from csv, txt, and other types of les
Erin Case
Data Scientist
Tips dataset
import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
tips.head()
sns.scatterplot(x="total_bill",
y="tip",
data=tips)
plt.show()
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker")
plt.show()
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker",
hue_order=["Yes",
"No"])
plt.show()
plt.show()
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker",
palette=hue_colors)
plt.show()
sns.countplot(x="smoker",
data=tips,
hue="sex")
plt.show()
Erin Case
Data Scientist
Questions about quantitative variables
Relational plots
sns.scatterplot(x="total_bill", sns.relplot(x="total_bill",
y="tip", y="tip",
data=tips) data=tips,
kind="scatter")
plt.show()
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="smoker")
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
row="smoker")
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="smoker",
row="time")
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="day",
col_wrap=2)
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="day",
col_wrap=2,
col_order=["Thur",
"Fri",
"Sat",
"Sun"])
plt.show()
Erin Case
Data Scientist
Scatter plot overview
Show relationship between two quantitative variables
We've seen:
New Customizations:
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
size="size")
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
size="size",
hue="size")
plt.show()
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
hue="smoker",
style="smoker")
plt.show()
Erin Case
Data Scientist
What are line plots?
Two types of relational plots: sca er plots and
line plots
Sca er plots
Line plots
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_mean,
kind="scatter")
plt.show()
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_mean,
kind="line")
plt.show()
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location")
plt.show()
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location",
markers=True)
plt.show()
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location",
markers=True,
dashes=False)
plt.show()
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="scatter")
plt.show()
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line")
plt.show()
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line",
ci="sd")
plt.show()
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line",
ci=None)
plt.show()
Erin Case
Data Scientist
Categorical plots
Examples: count plots, bar plots
sns.countplot(x="how_masculine",
data=masculinity_data)
plt.show()
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count")
plt.show()
sns.catplot(x="day",
y="total_bill",
data=tips,
kind="bar")
plt.show()
sns.catplot(x="day",
y="total_bill",
data=tips,
kind="bar",
ci=None)
plt.show()
sns.catplot(x="total_bill",
y="day",
data=tips,
kind="bar")
plt.show()
Erin Case
Data Scientist
What is a box plot?
Shows the distribution of quantitative data
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box")
plt.show()
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
order=["Dinner",
"Lunch"])
plt.show()
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
sym="")
plt.show()
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
whis=[0, 100])
plt.show()
Erin Case
Data Scientist
What are point plots?
Points show mean of quantitative variable
Di erences:
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point",
join=False)
plt.show()
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point")
plt.show()
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
estimator=median)
plt.show()
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
capsize=0.2)
plt.show()
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
ci=None)
plt.show()
Erin Case
Data Scientist
Why customize?
Reasons to change style:
Personal preference
Improve readability
Guide interpretation
sns.set_style()
plt.show()
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
sns.set_palette()
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count",
order=category_order)
plt.show()
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count",
order=category_order)
plt.show()
sns.set_palette(custom_palette)
sns.set_palette(custom_palette)
sns.set_context()
plt.show()
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
Erin Case
Data Scientist
Creating informative visualizations
> matplotlib.axes._subplots.AxesSubplot
g.fig.suptitle("New Title",
y=1.03)
plt.show()
Erin Case
Data Scientist
Adding a title to AxesSubplot
FacetGrid AxesSubplot
g = sns.catplot(x="Region", g = sns.boxplot(x="Region",
y="Birthrate", y="Birthrate",
data=gdp_data, data=gdp_data)
kind="box")
g.fig.suptitle("New Title",
y=1.03)
g.fig.suptitle("New Title",
y=1.03)
g.set_titles("This is {col_name}")
plt.show()
Erin Case
Data Scientist
Getting started
To import Seaborn:
To import Matplotlib:
To show a plot:
plt.show()
sns.relplot(x="x_variable_name",
y="y_variable_name",
data=pandas_df,
kind="scatter")
sns.catplot(x="x_variable_name",
y="y_variable_name",
data=pandas_df,
kind="bar")
plt.xticks(rotation=90)
Erin Case
Data Scientist
Where does Seaborn fit in?
Next steps:
Next steps:
Python
SQL
Next steps:
Cleaning data
Next steps:
Statistical analysis