You are on page 1of 23

Lab 4

Seaborn

1
Introduction
• Seaborn is a library for making statistical graphics in Python.
• It builds on top of matplotlib and integrates closely with pandas data
structures.
• Its plotting functions operate on dataframes and arrays containing whole
datasets and internally perform the necessary semantic mapping and
statistical aggregation to produce informative plots.

• Advantages over matplotlib


• Its dataset-oriented, declarative API lets you focus on what the different
elements of your plots mean, rather than on the details of how to draw
them.
• Built-in color palettes that can be used to reveal patterns in the dataset
• A high-level abstraction that still allows for complex visualizations.

2
Examples
– tips = sns.load_dataset("tips")
– tips.head()

sns.relplot(data=tips, x="total_bill", y="tip", hue="day") # default is scatter plot

hue=Grouping variable that will produce elements with different colors

3
Examples
• sns.set() # if we pass style=“ticks” to set fn , graph will be like previous graph
• sns.relplot(data=tips, x="total_bill", y="tip", hue="day", col="time")
• #col, row: Variables that define subsets to plot on different facets.

We can add another parameter after col (style=“day”) , so instead of


colored circles it will be diff colored shapes

4
Examples
– sns.relplot(data=tips, x="total_bill", y="tip", hue="day", col="time", row="sex")

5
Controlling Figure Aesthetics
• We have two categories controlling figure style and context
– To control the style, Seaborn provides two methods: set_style(style, [rc]) and
axes_style(style, [rc]).

– style: A dictionary of parameters or the name of one of the following


preconfigured sets: darkgrid, whitegrid,
– dark, white, or ticks
– rc (optional): Parameter mappings to override the values in the preset Seaborn
style dictionaries

6
Controlling Figure Aesthetics
sns.set_style("darkgrid")
sns.lineplot(x=["A", "B", "C"], y=[1, 3, 2])

– sns.set_style("darkgrid", {"grid.color": ".6", "grid.linestyle": ":"})


– sns.lineplot(x=["A", "B", "C"], y=[1, 3, 2])

7
Controlling Figure Aesthetics
• We can use seaborn style with matplotlib also
• sns.set_style("whitegrid")
• plt.figure()
• x1 = [10, 20, 5, 40, 8]
• x2 = [30, 43, 9, 7, 20]
• plt.plot(x1, label='Group A')
• plt.plot(x2, label='Group B')
• plt.legend()

• sns.set()# set_style()or pass style=“darkgrid”


• for both
• plt.figure()
• x1 = [10, 20, 5, 40, 8]
• x2 = [30, 43, 9, 7, 20]
• plt.plot(x1, label='Group A')
• plt.plot(x2, label='Group B')
• plt.legend()
8
Controlling Figure Aesthetics
• To control axes_style
• seaborn.axes_style(style, [rc]) returns a parameter dictionary for the aesthetic style of the plots.
The function can be used in a with statement to temporarily change the style parameters.

9
Scatter Plot
• To remove axes spines
– seaborn.despine(fig=None, ax=None, top=True, right=True, left=False,
bottom=False, offset=None, trim=False) # remove top and right spines
– Remove by default from current figure
– Can be used with matplotlib function like any other seaborn style fns

– sinplot()
– sns.despine()

You can refer for more styles


like set_context in ch 4 in book

Set the parameters that control the


scaling of plot elements.
This affects things like the size of
the labels, lines, and other
elements of the plot

10
Ex 1:Comparing IQ Scores for Diff
Test Groups by Using a Box Plot
• Use the whitegrid style, set the context to talk, and remove all axes spines, except
the one on the bottom. Add a title to the plot.

11
Heatmap
• A heatmap is a visualization where values contained in a matrix are represented as colors or
color saturation. (refer to color palettes in book)
(https://seaborn.pydata.org/generated/seaborn.color_palette.html)

• Heatmaps are great for visualizing multivariate data (compare more than two variables), we
are able to visualise the relationships between 3 variables on a 2D plane. where categorical
variables are placed in the rows and columns and a numerical or categorical variable
• These relationships can be complicated. This is why colour is used.

12
Ex 2: Using Heatmaps to Find Patterns in
Flight Passengers’ Data
• We will use a heatmap to find the patterns in the flight passengers' data
– Use your own color map. Make sure that the lowest value is the darkest color and that
the highest value is the brightest color.
– Sns.heatmap(data, ,cmap) # data :2d dataset,
– #cmap: the mapping from data values to color space (list of color or matplotlib
colormap)
• for more arg check https://seaborn.pydata.org/generated/seaborn.heatmap.html

13
Bar Plot
• we already explained how to create bar plots with Matplotlib. Creating bar plots
with subgroups was quite tedious, but Seaborn offers a very convenient way to
create various bar plots

data = pd.read_csv("data/salary.csv")
sns.set(style="whitegrid")
sns.barplot(x="Education", y="Salary", hue="District", data=data)

14
Ex 3:Movies Revisited

15

Univariate Distribution
• Seaborn offers handy functions to examine univariate (summarizes one variable
at time) and bivariate distributions (compare 2 variables).

• Seaborn use displot () to represent univariate distribution, this function will draw
histogram and kernel density estimation (KDE) fit.
• penguins = sns.load_dataset("penguins")# def type (hist.)
• sns.displot(data=penguins, x="flipper_length_mm")

16
Univariate Distribution

sns.displot(data=penguins,

We can add histogram with kde curve
x="flipper_length_mm", kind="kde")
sns.displot(data=penguins,
x="flipper_length_mm", kde=True)

17

Bivariate Distribution
For visualizing bivariate distributions, we will introduce three different plots. The
first two plots use the jointplot() function, that draw multiple bivariate plots with
univariate marginal distributions
• Example:
• penguins = sns.load_dataset("penguins")
• sns.jointplot(data=penguins,x="bill_length_mm", y="bill_depth_mm")

18
Bivariate Distribution
• Assigning hue variable will add additional colors to scatter plot and draw
separate density curves (using kdeplot()) on marginal axes
• Note :kdeplot () used to plot uni/bi variate dist. KDE directly

sns.jointplot(data=penguins,
x="bill_length_mm",
y="bill_depth_mm",
hue="species")

19
Pairwise Relationships
For visualizing multiple pairwise bivariate distributions in a dataset,
Seaborn offers the pairplot() function.

This function creates a matrix where off-diagonal elements visualize


the relationship between each pair of variables and the diagonal
elements show the marginal distributions.
Mydata=pd.read_csv("data/
basic_details.csv")
sns.set(style="ticks")
g = sns.pairplot(mydata, hue="Groups")

# if we don’t specify hue, kde will be


histogram

For more examples you can refer to:


https://seaborn.pydata.org/generated/
seaborn.pairplot.html

20
Violin Plot
• Violin plots are a method of plotting numeric data and can be considered a
combination of the box plot with a kernel density plot.

• The width of each curve corresponds with the approximate frequency of data
points in each region.

• Violin plots are used when you want to observe the distribution of numeric data,
and are especially useful when you want to make a comparison of distributions
between multiple groups.

21
Violin Plot
• tips = sns.load_dataset("tips")
• ax = sns.violinplot(x="day", y="total_bill", hue="sex", data=tips)
ax.set_title('Distribution of total bill amount per day', fontsize=16)

22
Ex4: Comparing IQ Scores for Different Test
Groups by Using a Violin Plot

23

You might also like