You are on page 1of 49

Unit 4: Introduction to Data Visualization

CSA202: Data Analytics and Visualization


Ms. Pema Wangmo

Gyalpozhing College of Information Technology


Royal University of Bhutan
BCS Year II, Sem I
SEABORN
Learning objectives
In this unit, you will learn about

1. Multiple plots with seaborn

2. Overview of seaborn plots


Multiple plots with seaborn
Using matplotlib
● Matplotlib provides various functions for plotting subplots. Some of them are add_axes(),
subplot(), and subplot2grid().

● Let’s see an example of each function for better understanding.


add_axes()
● To add the axes to the figure we use the figure module of the matplotlib library.

● The function of the figure module used to add axes to the figure is add_axes().

● Syntax: figure.add_axes (*args, *kwargs)

● Parameters:

○ rect: [left, bottom, width, and height] (The parameter is used to set new dimensions of the figure. It
takes values in float also.)

○ Projection: None, ‘aitoff’, ‘hammer’, ‘lambert’, ‘mollweide’, ‘polar’, ‘rectilinear’. By default, the project is
None which results in ‘rectilinear’ projection.

○ Polar: bool (By default, polar is set to False. If we set it to True, axes results in ‘polar’ projection.)

○ Sharex/sharey: The parameter is used to share the x-axis and y-axis.


Inside plot
Polar=true
projection and set_color
subplot()
subplot2grid()
● Syntax: plt.subplot2grid(shape, loc, rowspan=1, colspan=1, fig=None)

● Parameters:

○ Shape: Shape of grid in which to place axis (rows, columns)

○ Loc: Location to place axis within grid (rows, columns)

○ Rowspan: Number of rows for the axis to span to the right.

○ Colspan: Number of columns for the axis to span downwards.

○ Fig : Figure to place axis in. Defaults to current figure. (Figure, optional)
subplot2grid()
Using seaborn: FacetGrid()
● Seaborn also provides some functions for plotting multiple plots

● FacetGrid class helps in visualizing distribution of one variable as well as the relationship between
multiple variables separately within subsets of your dataset using multiple panels.

● Syntax: sns.FacetGrid(data, row=None, col=None)

● Parameters

○ Data: Dataframe

○ Row, col: strings (variables that define subsets of the data which will be drawn on separate facets in the grid.
The column should be categorical )
Using seaborn: FacetGrid()
Overview of seaborn plots
Seaborn plots
● We can classify all the functions offered by Seaborn into 2 categories:

○ Axes-level functions

○ Figure-level functions

● Axes-level functions are the easiest to understand. They operate on a single Matplotlib "Axes" object
and are intended to be "replacements" of the individual plot types provided by Matplotlib. When
used, they only control one Axes object, and not the entire figure.

● The figure-level functions take control of the entire Matplotlib "Figure" object. They create subplots
within your figure as they see fit. For each subplot, they delegate work to the Axes-level functions.
Figure level functions

Axes level functions


Creating different types of plots
1. Relational plots
○ Relational plots are used for visualizing the statistical relationship between the data points.

○ The process of understanding how the variables in the dataset relate each other and their
relationships are termed as Statistical analysis

○ To draw the relational plots seaborn provides three functions. These are:

■ relplot()

■ scatterplot()

■ lineplot()
relplot()
● Syntax: seaborn.relplot(x=None, y=None, data=None, kind=‘scatter’, **kwargs)

● The kind parameter selects the underlying axes-level function to use:

○ scatterplot() (with kind="scatter"; the default)

○ lineplot() (with kind="line")

● Parameters:

○ data: dataframe, numpy array

○ x, y : vector or key in data. can be either categorical or numeric


relplot()

load_dataset: load an example dataset


from the online repository (data
repository for seaborn) and returns
pandas dataframe.
Scatter plot
● The scatter plot is a mainstay of statistical visualization.

● It depicts the joint distribution of two variables using a cloud of points, where each point represents
an observation in the dataset.

● This depiction allows the eye to infer a substantial amount of information about whether there is any
meaningful relationship between them.

● It is plotted using the scatterplot() method.


Scatter plot
● Syntax: seaborn.scatterplot(x=None, y=None, data=None, hue=None)

● Parameters

○ Data: dataframe, numpy array

○ x,y: variables that specify positions on x and y axes

○ hue: Grouping variable that will produce points with different colors. Can be either categorical or numeric.
Scatter plot 1
.
Scatter plot 2
.
Scatter plot 2
.
Scatter plot 3
.
Scatter plot 4
.
Line plot
● For certain datasets, you may want to consider changes as a function of time in one variable, or as a
similarly continuous variable.

● In this case, drawing a line-plot is a better option. It is plotted using the lineplot() method.

● Syntax: seaborn.lineplot(x=None, y=None, data=None, **kwargs)


Line plot using matplotlib
Line plot using
seaborn
Creating different types of plots
2. Distribution plots
○ Distribution Plots are used for examining Univariate and bivariate distributions meaning
such distributions that involve one variable or two discrete variables.

○ Syntax: sns.displot(data, x, y, hue, kind)

○ The kind parameter selects the approach to use:

■ histplot() (with kind="hist"; the default)

■ kdeplot() (with kind="kde")

■ ecdfplot() (with kind="ecdf"; univariate-only)


Example
2D histograms
● We have been using histograms to study statistics (mostly counts) of a single variable (price for the
examples above). These are called univariate distributions.

● With 2D histograms, we can expand that definition to two variables (bivariate distributions).

● Lets illustrate this with the addition of lot-size to these plots.


Example

to help with the interpretation, you can


even include a colorbar
Creating different types of plots
3. Categorical plots
○ Categorical Plots are used where we have to visualize relationship between two
numerical variables.

○ A more specialized approach can be used if one of the main variable is categorical which
means such variables that take on a fixed and limited number of possible values.

○ Syntax: sns.catplot(data, x, y, kind)

○ By default strip plot is created.


Creating different types of plots
● There are many more Axes-level functions in categorical plots. To help manage this, we can
divide these functions into 3 broad buckets:

○ Categorical Scatter plots: These show individual points divided by categories. Eg; strip and
swarm plot

○ Categorical Distribution plots: They aggregate the points for each category and show their
overall distribution and various statistics. Eg: box and violin plot

○ Categorical Estimate plots: They also aggregate the points for each category, but focus on
showing a handful of statistics (sum, count etc.) rather than the distribution. Eg; point and bar
plot
Strip plot
● It basically creates a scatter plot based on the
category. It is created using the stripplot() method.

● For various houses that have the same number of


bedrooms and price, the points are staggered
randomly so we can see that there are many points.
This is called "jitter".

● It is best to use this type of plot when you have less


data values.
Swarm plot
● Swarmplot is very similar to the stripplot except the
fact that the points are adjusted so that they do not
overlap.
Swarm plot
Box plot
● boxplot shows the quartiles of the dataset
while the whiskers extend to show the
rest of the distribution i.e. the dots
indicating the presence of outliers.

● It is created using the boxplot() method.

● Falls under bivariate analysis where 1


categorical and 1 continuous variable is
required.
Bar plot
● A bar plot is basically used to aggregate
the categorical data according to some
methods and by default its the mean.

● To use this plot we choose a categorical


column for the x axis and a numerical
column for the y axis and we see that it
creates a plot taking a mean per
categorical column.

● It can be created using the barplot()


method.
Bar plot
● It also shows the confidence interval
(95%).

● Confidence interval is a range of values in


which there's a specified probability that
the expected true population parameter
lies within it.
Count plot
● For the special case where you're
interested in the "count" of data (the
number of datapoints in each category),
you can use a "count plot":
Point plot
● Another variation of the bar plot is the
"point plot".

● a point plot shows only the mean (or


other estimator) value

● Instead of showing the whole bar, it only


shows the point estimate (one point to
represent the height of the bar
effectively).

● It also shows the confidence interval


(95%).
Heat map
● Heatmap is defined as a graphical representation of data using colors to visualize the value of the
matrix.

● The x-axis is often some measure of time but can be any variable with groupings. The y-axis is a
variable that defines the categories in the data.

● Each rectangle is the same size. The rectangles are colored to show the magnitude of a third
variable.

● Heat map has the ability to derive valuable insights from vast datasets
Heat map
● Heatmaps in Seaborn can be plotted by using the seaborn.heatmap() function.

● Syntax: sns.heatmap(data, vmin, vmax, cmap)

○ Data: 2D dataset that can be coerced into an ndarray

○ Vmin, vmax: values to anchor the colormap, otherwise they are inferred from the data

○ Cmap: mapping from data values to color space


Example
any questions??
Find me at pemawangmo.gcit@rub.edu.bt

You might also like