Data Visualization – Plotting Graphs with PyPlot
Introduction: "A picture is worth a thousand words". Most of us are familiar with this
expression. Data visualization plays an essential role in the representation of both small
and large-scale data. It especially applies when trying to explain the analysis of increasingly
large datasets.
Data Visualization: It refers to the graphical or visual representation of information and
data using visual elements like charts, graphs and maps etc. Data visualization is the
discipline of trying to expose the data to understand it by placing it in a visual context. Its
main goal is to distill large datasets into visual graphics to allow for easy understanding of
complex relationships within the data. Several data visualization libraries are available in
Python, namely Matplotlib, Seaborn, and Folium etc.
Purpose of Data Visualization: Data Visualization helps in decision-making process by
means of graphical representation of Data. Some of the main reasons of using data
Visualization includes:
Better Analysis
Quick Action
Identifying Patterns
Finding errors
Understanding the story
Exploring Business insights
Grasping the latest Trends
Python Plotting Library – PyPlot: For Data Visualization in Python, Matplotlib Library
PyPlot interface is used. Matplotlib is the whole python package/ library used to create 2D
graphs and plots by using python scripts and pyplot is a module in matplotlib, which
supports a very wide variety of graphs and plots namely – Line Charts, Bar Charts,
Histogram, Pie Charts, Scatter Charts etc. It is used along with NumPy to provide an
environment for MatLab.
PyPlot: PyPlot provides the state-machine interface to the plotting library in Matplotlib. It
means that figures and axes are implicitly and automatically created to achieve the desired
plot. For example, calling plot() from pyplot will automatically create the necessary figure
and axes to achieve the desired plot. Setting a title will then automatically set that title to
the current axes object. The pyplot interface is generally preferred for non-interactive
plotting (i.e., scripting). Following features are provided in Matplotlib library for data
visualization.
Drawing – plots can be drawn based on passed data through specific functions.
Customization – plots can be customized as per requirement after specifying it in
the arguments of the functions. Like color, style (dashed, dotted), width; adding label,
title, and legend in plots can be customized.
Saving – After drawing and customization plots can be saved for future use.
Plotting Graphs with PyPlot: To plot graphs using pyplot, the following steps are used:
Install matplotlib by pip command by typing the following command in command
prompt
pip install matplotlib
Create a .py file & import matplotlib library in it using the given statement:
import matplotlib.pyplot
OR
import matplotlib.pyplot as plt
Set data points in plot() method of plt object
Customize plot through changing different parameters
Call the show() method to display plot
Save the plot/graph if required.
Types of Plots/Graphs using Matplotlib: Data Visualization essentially means graphical
representation of compiled data. Thus, Graphs and Charts are very effective tools for Data
Visualization. We can create many different types of Graphs and Charts using pyplot. Some
of the commonly used chart types are:
1) Line Chart: A Line chart or a Line Graph is a type of chart which displays
information as a series of data points called ‘markers’ connected by straight line. To
plot a line chart with pyplot the plot( ) function is used.
2) Bar Chart: A Bar Chart or Bar Graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the values they
represent. The bars can be plotted Horizontally or Vertically by using the bar( ) or
barh( ) function.
3) Histogram Plot: A Histogram is a type of graph that provides visual interpretation of
numerical data by including the number of data points that lie within a range of
values. To plot a Histogram with pyplot the hist( ) function is used.
4) Pie Chart: A Pie Chart or a Circle Chart is a circular statistical graphic, which is
divided into slices to illustrate numerical proportions. Pie chart can plot only one
data sequence. To plot a Pie chart with pyplot the pie( ) function is used.
5) Scatter Plot: The Scatter Plot is similar to Line Chart, the major difference is that
while Line Graph connects the data points with a Line, Scatter Chart simply Plots the
data points to show the trend in the data. To plot a Scatter chart with pyplot the
scatter( ) function is used.
6) Box Plot: A Box Plot is the visual representation of the statistical five number
summary of a given data set To plot a Box Plot with pyplot the boxplot( ) function is
used.
Creating a Line Chart: Before plotting/creating any type of chart or graph type, make sure
to import the matplotlib.pyplot library, such as:
import matplotlib.pyplot
OR
import matplotlib.pyplot as plt
To create a Line Chart, the PyPlot interface offers plot( ) function. For Example consider the
given code to plot a line graph.
import matplotlib.pyplot as plt
x=[1,2,3,4,5]
y=[1,4,9,16,25]
plt.plot(x,y)
plt.show()
Customizing Line Plot/Graph: The plot( ) method has a number of attributes or argument
that we can use to customize a Line Plot or Graph, such as:
1) plt.figure(figsize=(Width, Length)): used to increase or decrease the figure size of
the graph/plot.
For e.g.: plt.figure(figsize=(15,7))
2) color=’color name or code’: used to change the color of the line.
Color Codes are: ‘b’ Blue, ‘g’ Green, ‘r’ Red, ‘m’ Magenta, ‘y’ Yellow,
‘k’ Black, ‘c’ Cyan, ‘w’White
We can also specify the color by using their full name or by using hex
strings like #008000
For e.g.: plt.plot(x, y, color=’b’)
3) linewidth=width: used to change the width of the line, where the width value is in
points.
For e.g.: plt.plot(x, y, color=’b’,linewidth=2)
4) marker=’marker type’: The data points being plotted on a graph/chart are called
markers. To change the marker type, the marker attribute is used.
For e.g.: plt.plot(x, y, color=’b’ linewidth=2, marker=’d’)
One can also change the color and size of the marker by using ‘markersize’ and
‘markeredgecolor’ attributes.
For e.g.: plt.plot(x, y, color=’b’ linewidth=2, marker=’d’, markersize=5,
markeredgecolor=’red’)
5) plt.xlabel(‘X-Axis Label’): Used to specify a label for x-axis.
6) plt.ylabel(‘Y-Axis Label’): Used to specify a label for y-axis.
7) plt.title(‘Title of the Chart’): Used to specify a Title for the Chart.
8) plt.show(): used to display the chart / graph
9) plt.savefig(‘Filename’): used to save a plot/graph created using pyplot functions for
later use or for keeping records.
For e.g.: plt.savefig(‘MonthlySale.jpg’)
2) Creating a Bar Chart: A Bar Graph or a Bar chart is a graphical display of data using
bars of different heights. A bar chart can be drawn vertically or horizontally using rectangle
or bars of different height / width. To plot a Bar Graph or Bar Chart Python offers bar( )
function to create a bar chart where we can specify the sequences for x-axis and
corresponding sequence to be plotted on y-axis. Each y-value is plotted as bar on
corresponding x-value on x-axis.
For e.g: #Python program to plot a Bar Graph for the population of four different cities
‘Delhi’, ‘Mumbai’, ‘Bangalore’, ‘Hyderabad’. The population of the given cities
are 23456123, 20083104, 18456123, 134110931.
import matplotlib.pyplot as plt
cities=[‘Delhi’, ‘Mumbai’, ‘Bangalore’, ‘Hyderabad’]
population=[23456123, 20083104, 18456123, 134110931]
plt.bar(cities,population)
plt.xlabel(‘Cities’)
plt.ylabel(‘Population’)
plt.show()
Creating Multiple Bar Charts: There may be situation where we need to plot multiple data
ranges on the same bar chart creating multiple bars. PyPlot does not provide a specific
function for this, but we can plot multiple bars on the same bar chart by width and color
arguments of bar( ) function. To plot multiple bars, the following things need to be done in
advance:
1) Decide number of X points: Firstly, determine how many X points we will need. For
this calculate the number of entries in the ranges being plotted.
2) Decide the thickness of each bar and accordingly adjust X points on X-axis.
3) Give different colors to different data ranges
4) The width argument remains the same for all ranges being plotted.
5) Plot using bar( ) for each range separately.
For e.g.: Write a program in python to plot a bar chart for the medals won by top four
countries namely Australia [80,59,59,198], England [45,45,46,136], India [26,20,20,66]
and Canada [15,40,27,82], where each value represents the number of Gold, Silver, Bronze
and Total Medals respectively.
import matplotlib.pyplot as plt
import numpy as np
medal=['Gold','Silver','Bronze','Total']
Australia=[80,59,59,198]
England=[45,45,46,136]
India=[26,20,20,66]
Canada=[15,40,27,82]
x=np.arange(len(medal))
plt.bar(medal,Australia,width=.15,label='Australia')
plt.bar(x+0.15,England,width=.15,label='England')
plt.bar(x+0.30,India,width=.15,label='India')
plt.bar(x+0.45,Canada,width=.15,label='Canada')
plt.legend()
plt.title('Medal Tally of Four Countries')
plt.xlabel('Medals')
plt.ylabel('No. of Medals')
plt.show()
Creating Histogram with PyPlot: A Histogram is a summarization for discrete or
continuous data. A Histogram provides a visual interpretation of numerical data points that
fall within a specified range of values (called bins).it is similar to a vertical bar graph.
However, a Histogram, unlike a vertical bar graph, shows no gaps between the data.
Histogram is a great way to show result of continuous data, such as weight, height, how
much time and so forth. But when the data is in categories (such as Country or Subject
etc.), one should use a bar graph.
To create a Histogram from a given sequences of number’s, the hist() method is used. The
syntax for using hist() function is:
plt.hist(x, bins=None, cumulative=False, histtype=’bar’, align=’mid’, orientation=’vertical’)
Parameters:
x array or sequence of array to be plotted on Histogram
bins optional, if an integer is given bin+1 bin-edges are calculated and
returned. Default value is automatically provided internally
cumulative bool, optional, if True, then a Histogram is computed where each bin
gives the count in that bin plus all bins for smaller values. The last bin
gives the total number of data points. Default is False
histtype [‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’], optional, the type of Histogram to
draw.
‘bar’ is a traditional bar-type Histogram. If multiple data are given,
the bars are arranged side by side.
‘barstacked’ is a bar-type where multiple data are stacked on top of
each Other.
‘step’ generates a lineplot that is by default unfilled.
‘stepfilled’ generates a lineplot that is by default filled.
orientation [‘horizontal’, ‘vertical’], optional, if ‘horizontal ‘, barh will be used for bar-
type Histogram.
For Example: Prof. Awasthi is doing some research in the field of Environment. For
some plotting purpose, he has generated some data as:
mu=100
sigma=15
x=mu+ sigma * numpy.random.randn(10000)
Write a program to plot this data on a horizontal histogram with this
data.
Solution:
import matplotlib.pyplot as plt
import numpy as np
mu=100
sigma=15
x=mu+sigma*np.random.randn(10000)
plt.hist(x, bins=30, orientation=’horizontal’)
plt.title(‘Research data Histogram’)
plt.show()