Seaborn Plotting and Visualization

Plotting and Visualization
Lecture 5
Sergio Caballero, Ph.D.

sergioac@mit.edu
SCM.254 Applied Programming and Data Analysis in Python | L5: Plotting and Visualization | Page 1
Outline
• Introduc4on to seaborn
• Plo6ng numerical data
• Plo6ng categorical data
• Plo6ng the distribu4on of a dataset
Introduction to Seaborn
seaborn
• Seaborn is a Python data visualization library based
on matplotlib.
seaborn.set_style
• Set the aesthe4c style of the plots.
Choosing color palettes
• Seaborn makes it easy to select and use color paleSes that
are suited to the kind of data you are working with.
Qualitative color palettes Sequential color palettes
Diverging color palettes
Plo6ng numerical data
Visualizing statistical relationships
• The goal is to understand how numerical variables in a
dataset relate to each other.
• relplot() is a function for visualizing statistical relation-
ships producing two common plots:
§ scatter plots
§ line plots
relplot()
relplot(x='A', y='B', hue='C', size='D', style='E',
kind='line', data=df)
• Arguments:
§ x,y: names of variables (columns); must be numeric
§ hue, size, style: grouping variable that produces elements
with different colors, sizes or styles; categorical or numeric
§ kind: kind of plot to draw; 'scatter' (the default) or 'line'
§ data: DataFrame
Scatter plots
• Scatter plots are a useful way of examining the relationship
between two one-dimensional datasets.
• Example:
routes:
Scatter plots (Cont.)
Line plots
• A line plot displays informa4on as a series of data points
connected by straight line segments.
• Example:
route_r001:
Line plots (Cont.)
Line plots (Cont.)
Plotting categorical data
Plotting categorical data
• catplot() is a func4on that shows the rela4onship
between a numerical and one or more categorical variables.
• It can be used for:
§ Categorical scaSer plots
§ Categorical distribu4on plots
o Boxplot
o Violinplot
catplot()
catplot(x='A', y='B', hue='C', order=['A1', 'A2'],
kind='box', data=df)
• Arguments:
§ x,y: names of variables (columns)
§ hue: grouping variable that produces elements with different
colors
§ kind: kind of plot to draw; 'strip' (the default), 'box',
'violin'
§ order: order to plot the categorical levels
§ data: DataFrame
Categorical scatter plot
• A categorical scaSer plot represents categorical data with a
scaSer plot.
• Example:
data:
Categorical scatter plot (Cont.)
Overlapping points Categorical variable on y-axis
Boxplots
• A boxplot shows the three quartile values of the distribution
along with extreme values. The whiskers extend to points
that lie within 1.5 IQRs of the lower and upper quartile.
data:
Boxplots (Cont.)
Categorical variable on x-axis Categorical variable on y-axis
Violin plots
• A violin plot combines a boxplot with an estimation of the
distribution of values. The quartiles and whiskers are shown
inside the violin.
data:
Bar plots
• barplot() is a function that represents an estimate of
central tendency for a numeric variable (height of each
rectangle) and provides some indication of the uncertainty
around that estimate.
Measure of uncertainty
Mean
barplot()
barplot(x='A', y='B', hue='C', order=['A1', 'A2'],
ci='sd', data=df)
• Arguments:
§ x,y: names of variables (columns); x-categorical, y-numerical
§ hue: grouping variable that produces elements with different
colors
§ order: order to plot the categorical levels
§ ci: size of confidence intervals (float or 'sd'); default 95% CI
§ data: DataFrame
Bar plots (Cont.)
Average dura4on by route
• 95% confidence interval • Standard deviation
Plotting the distribution of a dataset
Plotting the distribution of a dataset
• When analyzing your data, often the first thing you will want
to do is to get a sense for how a variable is distributed.
• You can do this by creating a:
§ Histogram (distplot)
§ Density plot (kdeplot)
distplot()
distplot(a=df['A'], bins=x, kde=False)
• Arguments:
§ a: observed data; numerical
§ bins: number of bins; integer
§ kde: whether to plot a gaussian kernel density estimate; default
True
Histograms
• Histogram and KDE • Only histogram
Histograms (Cont.)
• Specifying the number of bins
Default
kdeplot()
kdeplot(data=df['A'], shade=True)
• Arguments:
§ data:observed data; numerical
§ shade: if True, area under the KDE curve is shaded
Density plots
Default
References
• Seaborn tutorial: hJps://seaborn.pydata.org/tutorial.html

Seaborn Plotting and Visualization

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Seaborn Plotting and Visualization

Uploaded by

Copyright:

Available Formats

Plotting and Visualization

Sergio Caballero, Ph.D.

Diverging color palettes

You might also like