You are on page 1of 24

Objectives

Chapter 4
 Introduction to Matplotlib
 Build data visualizations with the Matplotlib
Data Visualizations with Matplotlib library
 Setting up Matplotlib
 Basic Plotting with Matplotlib
Lecturer. Hanad Mohamud Mohamed
 Customizing Plots
 Advanced Plotting Techniques
 Build data visualization with Seaborn Lib
Data Visualization
Data visualization in the context of data science is a critical aspect that involves the
graphical representation of information and data. By using visual elements like charts,
graphs, and maps, data visualization tools and techniques provide an accessible way to
see and understand trends, outliers, and patterns in data
visualization serves several key roles in data science:
Simplifying Complex Data: Data science often deals with large, complex datasets.
Visualization helps in breaking down these complexities, making the data more
understandable. By converting numbers and metrics into visual stories, it becomes easier
for data scientists and stakeholders to grasp difficult concepts or identify new patterns.
Communicating Findings: One of the primary uses of data visualization is to communicate
results to stakeholders. Effective visualizations can convey the findings of complex data
analyses in a manner that is accessible to a non-technical audience. This is crucial in
decision-making processes where insights drawn from data need to be understood by
those who may not have a background in data science.
Some Data Visualizations widely used
Matplotlib: Matplotlib is a widely used Python library for creating static, interactive, and
animated visualizations. It offers extensive customization options, making it ideal for
creating a wide range of 2D graphs and plots. Its versatility and ease of use make it a
popular choice for beginners and professionals alike.
Seaborn: Built on top of Matplotlib, Seaborn simplifies the process of creating beautiful
and informative statistical graphics. It integrates closely with pandas data structures and
provides high-level functions for creating complex visualizations, including heatmaps and
violin plots, with more aesthetic default styles.
Plotnine (ggplot): Inspired by R’s ggplot2, Plotnine is based on the Grammar of Graphics,
allowing users to create complex visualizations by adding layers of components. It’s
particularly useful for those familiar with ggplot2, providing a way to use similar
functionality in Python.
Cont…
Bokeh: Bokeh excels at creating interactive and real-time streaming visualizations in web
browsers. It's great for creating highly interactive plots, dashboards, and data
applications that can connect to real-time data feeds.
Pygal: Pygal, a dynamic SVG charting library, stands out for its ability to produce SVG
(Scalable Vector Graphics) charts. It’s particularly suited for creating charts that are
easily scalable and can be embedded in web pages.
Plotly: Plotly is known for its interactive plots and is highly compatible with web
applications. It supports a wide range of charts and graphs, including 3D charts, and
integrates well with other libraries and frameworks.
Geoplotlib: This library is designed for creating maps and geographical plots, making it
perfect for visualizing spatial data. Geoplotlib can be used to create a variety of map
types, including heatmaps and dot density maps.
Cont..

Missingno: Specifically designed for handling missing data, Missingno provides a small
toolset of flexible and easy-to-use visualizations and utilities for identifying patterns of
missing data in datasets, which is crucial for effective data cleaning and preprocessing.

Folium: Folium is particularly useful for visualizing geospatial data in Python. It leverages
the mapping strengths of the Leaflet.js library, enabling the creation of sophisticated maps
and geographical data visualizations directly within the Python environment.

Each of these libraries has its own strengths and ideal use cases, offering a rich set of tools
for data analysts and scientists to visually interpret and present data.
Data Visualization
We will Focus on Matplotlib and Seaborn Data Visualization Libraries
"A picture is worth a thousand words". Most of us are familiar with this expression. Data
visualization plays an essential role in the representation of both small and large-scale data. It
especially applies when trying to explain the analysis of increasingly large datasets.

Data visualization is the discipline of trying to expose the data to understand it by placing it in
a visual context. Its main goal is to distill large datasets into visual graphics to allow for an easy
understanding of complex relationships within the data.

Several data visualization libraries are available in Python, namely Matplotlib, Seaborn, and
Folium etc.
Purpose of Data Visualization

• Better analysis
• Quick action
• Identifying patterns
• Finding errors
• Understanding the story
• Exploring business insights
• Grasping the Latest Trends
Matplotlib for plotting a dataframe
• The matplotlib is a comprehensive package for data visualization and
comes as part of Anaconda.
 Pandas has a convenient integration with matplotlib, which means that data
contained in a dataframe can be plotted with plot():
• You can select the plot type of your choice ( e.g., scatter, bar, boxplot, pie,
hist, …) corresponding to your data
• Please see the resource at the end for more information on various plots
and arguments of the plot() function
Plotting library
Matplotlib is the whole Python package/ library used to create 2D graphs and plots by
using Python scripts. pyplot is a module in Matplotlib, which supports a very wide
variety of graphs and plots namely - histograms, bar charts, power spectra, error charts,
etc. It is used along with NumPy to provide an environment for MatLab.

Pyplot provides the state-machine interface to the plotting library in matplotlib. It means
that figures and axes are implicitly and automatically created to achieve the desired plot.
For example, calling plot from pyplot will automatically create the necessary figure and
axes to achieve the desired plot. Setting a title will then automatically set that title to the
current axes object.The pyplot interface is generally preferred for non-interactive
plotting (i.e., scripting).
Matplotlib – pyplot features

Following features are provided in matplotlib library for data visualization.


• Drawing – plots can be drawn based on passed data through specific
functions.
• Customization – plots can be customized as per requirement after
specifying it in the arguments of the functions.Like color, style (dashed,
dotted), width; adding label, title, and legend in plots can be customized.
• Saving – After drawing and customization plots can be saved for future use.
How to plot in matplotlib

Steps to plot in matplotlib


• Install matplotlib by pip command -
pip install matplotlib in command prompt
• import the matplotlib library in it using
- import matplotlib.pyplot as plt statement
• Set data points in plot() method of plt object
• Customize parameters plot by changing
different
• Call the show() method to display plot
• Save the plot/graph if required
Types of plot using matplotlib

• LINE PLOT
• BAR GRAPH
• HISTOGRAM
• Bie chat
• Scatter plot
• etc
Matplotlib –line plot

Line Plot
A line plot/chart is a graph that shows the frequency of data occurring
along a number line.
The line plot is represented by a series of datapoints connected with a
straight line. Generally line plots are used to display trends over time. A
line plot or line graph can be created using the plot() function available in
pyplot library. We can, not only just plot a line but we can explicitly define
the grid, the x and y axis scale and labels, title and display options etc.
Matplotlib –line plot
E.G.PROGRAM
import numpy as np
import matplotlib.pyplot as plt
year = [2020,2021,2022,2023]
ca202_pass_percentage = [90,92,94,95]
ca204_pass_percentage = [89,91,93,95]
plt.plot(year, ca202_pass_percentage,
color='g')
plt.plot(year, ca204_pass_percentage,
color='orange')
plt.xlabel('year')
plt.ylabel('Pass percentage')
plt.title('Pass percentage class of 2020')
plt.show()
Note:- As many lines required call
plot() function multiple times with
suitable arguments.
Line Plot customization
• Custom line color
• plt.plot(year, passpercentage, color='orange')
• Change the value in color argument.like ‘b’ for blue,’r’,’c’,…..
• Custom line style
• plt.plot( [1,1.1,1,1.1,1], linestyle='-' , linewidth=4).
• set linestyle to any of '-‘ for solid line style, '--‘ for dashed, '-.‘ , ':‘ for dotted line
• Custom line width
• plt.plot( 'x', 'y', data=df, linewidth=22)
set linewidth as required
• Title
• plt.title(‘pass class of 2020') – Change
it as per requirement
• Lable - plt.xlabel(‘Year') - change x or y label
as per requirement
• Legend - plt.legend((‘ca202’,’ca204'),loc='upper
Plotting with Pyplot
Plot bar graphs
label = ['Mohamed', 'Alim', 'Warfa', 'Hirsi', 'Samater',
'Daud']
per = [94,85,45,25,50,54]
index = np.arange(len(label))
plt.bar(index, per)
plt.xlabel('Student Name', fontsize=8)
plt.ylabel('Percentage', fontsize=5)
plt.xticks(index, label, fontsize=8, rotation=30)
plt.title('Percentage of Marks achieve by student Class
XII')
plt.show()
#Note – use barh () for horizontal bars
Bar graph customization
• Custom bar color
• plt.bar(index, per,color="green",edgecolor="blue")
• Change the value in color,edgecolor argument.like ‘b’ for blue,’r’,’c’,…..
• Custom line style
• plt.bar(index, per,color="green",edgecolor="blue",linewidth=4,linestyle='--')
• set linestyle to any of '-‘ for solid line style, '--‘ for dashed, '-.‘ , ':‘ for dotted line
• Custom line width
• plt.bar(index, per,color="green",edgecolor="blue",linewidth=4)
set linewidth as required
• Title
•plt.title('Percentage of Marks achieve by student Class
XII') Change it as per requirement
• Lable - plt.xlabel('Student Name', fontsize=5)- change x or y label
as per requirement
• Change (),loc,frameon property as per requirement
Matplotlib Histogram

A histogram is a graphical representation which organizes a group of data


points into user-specified ranges.
Histogram provides a visual interpretation of numerical data by showing the
number of data points that fall within a specified range of values (“bins”).
It is similar to a vertical bar graph but without gaps between the bars.
data = [1,11,21,31,41] #first argument of hist() method is
position (x,y Coordinate) of weight,
plt.hist([5,15,25,35,45], where weight is to be displayed.
bins=[0,10,20,40,70], No of coordinates must match with
No of weight otherwise error will
weights=[20,10,45,33,8], generate
edgecolor="red") #Second argument is interval
#Third argument is weight for bars
plt.show()
Customization of Histogram –
By default bars of histogram is displayed in blue color but we can change it to other
color with following code
plt.hist([1,11,21,31,41, 51],
bins=[0,10,20,30,40,50, 60], weights=[10,1,0,33,6,8],
acecolor='y',
edgecolor="red")
In above code we are passing ‘y’ as facecolor means yellow color to be displayed in
bars.
To give a name to the histogram write below code before calling show()
plt.title("Histogram Heading")
Edge color and bar color can be set using following parameter in hist() method
edgecolor='#E6E6E6',color='#EE6666 .color value can be rgb in hexadecimal form For
x and y label below code can be written
plt.xlabel('Value')
plt.ylabel('Frequency')
Matplotlib – How to save plot
For future use we have to save the plot.To save any plot savefig() method is
used.plots can be saved like pdf, svg, png, jpg file formats.
plt.savefig('line_plot.pdf')
plt.savefig('line_plot.svg')
plt.savefig('line_plot.png') Parameter for
saving plots .e.g.
plt.savefig('line_plot.jpg', dpi=300, quality=80, optimize=True,
progressive=True)
Which Export Format to Use?
plt.savefig('pass_percentage_plot_1.png', dpi=300, bbox_inches='tight')
plt.show()
The export as vector-based SVG or PDF files is generally preferred over
bitmap-based PNG or JPG files as they are richer formats, usually providing
higher quality plots along with smaller file sizes.
Seaborn
Seaborn is a powerful data visualization library in Python that is built on top of
Matplotlib. It provides a high-level interface for creating beautifully styled statistical
graphs. One of Seaborn's standout features is its ability to play well with dataframes,
making it synergistic with Pandas, a library renowned for data manipulation.

Setting up Seaborn
To start working with Seaborn, it's essential to have both Seaborn and Matplotlib
imported into the workspace. This can be achieved with the following code:

import seaborn as sns


import matplotlib.pyplot as plt
Seaborn, a top-tier Python visualization library, provides aesthetically pleasing graphs
that are also informative. In the domain of film analytics, understanding the patterns
and distributions is essential,
MATPLOTLIB VS. SEABORN
Matplotlib is a library in Python that enables users to generate visualizations like
histograms, scatter plots, bar charts, pie charts and much more.
Seaborn is a visualization library that is built on top of Matplotlib. It provides data
visualizations that are typically more aesthetic and statistically sophisticated.

The most well-known of these data visualization libraries in Python, Matplotlib, enables
users to generate visualizations like histograms, scatter plots, bar charts, pie charts and
much more.
Seaborn is another useful visualization library that is built on top of Matplotlib. It provides
data visualizations that are typically more aesthetic and statistically sophisticated. Having a
solid understanding of how to use both of these libraries is essential for any data scientist
or data analyst as they both provide easy methods for visualizing data for insight
Note. We going to implement seaborn in the film data set
END

You might also like