Fundamentals of Data
Visualization in python
Data Visualization is the representation of data graphically or
pictorially. Allows high-level Representatives to see analytics, grasp
difficult concepts and identify new patterns at ease.
Ex: Kmeans clustering, Its easier to understand the cluster labeling
done by an algorithm of [Link] possible to label clusters just by
checking raw data.
Famous Anscombe’s Quartet Example explains the importance of
Data Visualization. Python code is readily available.
Matplotlib in python
open-source python lib for visualization
creates graphs and plots using a python script. It allows for
saving plots in local systems. Provides Object-oriented API
[Link](‘[Link]’)
It has a Module called Pyplot, which has simple functions used
for visualization. (line, images, text, labels, etc)
Supports a wide range of graphs
Line plot, Bar plot, Scatter Plot, Histogram
Image Plot, Box plot, Violin Plot, Stream plot, Quiver plot, Area Plot,
Peter Plot, and Donut Plot
Easy integration with Pandas and Numpy.
Line Plot
import numpy as np
import [Link] as plt
# allows plot to be display below the notebook
%matplotlib inline
#defining the dataset
x=[Link](0,10,0.1)
y=3*x+5#plotting the datapoints
[Link](x,y)
[Link]()
Customizing Line plots (Compare with basic line plot)
import numpy as np
import [Link] as plt
# allows plot to be display below the notebook
%matplotlib inline
#defining the dataset
x=[Link](0,10,1)
y=3*x+5
#plotting the datapoints
[Link](x,y,linewidth =2.0 , linestyle =":",color ='y',alpha
=0.7, marker ='o')
[Link]("Line Plot Demo")
[Link]("X-Axis")
[Link]("Y-Axis")
[Link](['line1'], loc='best')
[Link](True)
[Link]()
Figure Size
...
y=3*x+5
#changing the figure
fig=[Link](figsize=(10,5))
#plotting the datapoints
...
subplots
import numpy as np
import [Link] as plot
%matplotlib inlinex=[Link](0,10,1)
y1=2*x+5
y2=3*x+[Link](2,1,1) #A
# B - [Link](1,2,1)
#(height,width,column)
[Link](x,y1)
[Link]('Graph1')[Link](2,1,2) #A
# B - [Link](1,2,2)
[Link](x,y2)
[Link]('Graph2')
[Link]()
Bar plot
import numpy as np
import [Link] as plt
%matplotlib inlinedata = {'apples':20,'Mangoes':15,
'lemon':30,'Oranges':10}
names =list([Link]())
values =list([Link]())[Link](3,1,1)
#fig =[Link](figsize =(10,5))
[Link](names,values,color ="orange")
[Link]("Bar Graph Demo")
[Link]("Fruits")
[Link]("Quantity")[Link](3,1,3)
[Link](names,values,color ="orange")
[Link]("Bar Graph Demo")
[Link]("Fruits")
[Link]("Quantity")
[Link]()
Scatter Plots
import [Link] as plt
%matplotlib inline
#dataset Note - a and y1,y2 should be of same size
a=[10,20,30,40,50,60,70,80]
y1=[2,3,5,6,1,4,5,3]
y2=[1,2,3,4,5,5,1,3][Link](a,y1)
[Link](a,y2)
[Link]()
Customizing Scatter plots compare with the basic plot
import [Link] as plt
%matplotlib inline
#dataset Note - a and y1,y2 should be of same size
a=[10,20,30,40,50,60,70,80]
y1=[2,3,5,6,1,4,5,3]
y2=[1,2,3,4,5,5,1,3][Link](a,y1,
c='g',s=300,edgecolors='y',marker='o',alpha=0.5)
[Link](a,y2,
c='y',s=400,edgecolors='b',marker='3',alpha=1)
[Link](['y1','y2'],loc='best')
[Link]("X-Axis")
[Link]("Y-Axis")
[Link](True)
[Link]()
[Link]
A histogram is a graphical display of data using bars of
different heights. In a histogram, each bar group numbers
into ranges. Taller bars show that more data falls in that
range. A histogram displays the shape and spread of
continuous sample data.
import [Link] as plt
%matplotlib inline
numbers =
[10,90,12,16,19,12,20,26,28,30,38,35,34,45,60,68,64,62,70,78,75,
79,85,94,95]
[Link](numbers,bins=[0,20,40,60,80,100],
color='#FFF233',edgecolor='#000000')
[Link]("Histogram Demo")
[Link](True)
[Link]("Range of values")
[Link]("Freq of values")
[Link]()
Box Plot and Violin Plot
box plot helps to Analyse data efficiently and does the outer
analysis of data such as outlier, Quartile, etc
a violin plot is used for large amounts of data, where the
individual representation of data is not possible.
import [Link] as plt
%matplotlib inline#data
total = [20,4,1,30,20,12,20,70,32,10]
order =[10,3,2,15,17,2,30,44,2,1]
discount = [30,10,20,5,10,20,50,60,20,45]
data = list([total, order, discount])
print(data)[Link](data,showmeans =True)
[Link]("Box plot Demo")
[Link](True)
[Link]()
import [Link] as plt
%matplotlib inline#data
total = [20,4,1,30,20,12,20,70,32,10]
order =[10,3,2,15,17,2,30,44,2,1]
discount = [30,10,20,5,10,20,50,60,20,45]
data = list([total, order, discount])
print(data)[Link](data,showmeans =True,
showmedians=True)
[Link]("Violin plot Demo")
[Link](True)
[Link]()
Pie Chart, Donut Chart
import [Link] as plt
%matplotlib inline#prepare the dataset
label=['Dog','Cat','Wolf','Lion']
sizes=[50,45,60,80][Link](sizes,labels =label)
[Link]("Pie Chart DEmo")
[Link]()
Customization
import [Link] as plt
%matplotlib inline#prepare the dataset
label=['Dog','Cat','Wolf','Lion']
sizes=[50,45,60,80]#add colors
colors =
['#ff9999','#66b3ff','#99ff99','#ffcc99'][Link](sizes,labels
=label, colors =colors,autopct='%1.1f%
%' ,shadow=True ,startangle = 90, explode=(0,0.1,0,0))
[Link]("Pie Chart Demo")
[Link]()
#Donut plot
import [Link] as plt
%matplotlib inline
group_names = ["GroupA","GroupB","GroupC"]
group_size=[20,30,50]
size_centre = [5]#colors
colors = ['#ff9999','#66b3ff','#99ff99','#ffcc99']pie1
=[Link](group_size, labels = group_names,radius =1.5,colors
=colors)
pie2 = [Link](size_centre,radius =1.0,colors ='w')
[Link]()
Area Plots
similar to Line plot, only difference Area under the slope is
colored
import [Link] as plt
%matplotlib inline#dataset
x=range(1,17)
y=[1,4,6,8,4,5,3,8,8,8,4,1,5,6,8,7][Link](x,y)
[Link]()
few customizations
import [Link] as plt
%matplotlib inline#dataset
x=range(1,17)
y=[1,4,6,8,4,5,3,8,8,8,4,1,5,6,8,7][Link](x,y, colors
='green', alpha =0.5)
[Link](x,y, color='g')
[Link](True)
[Link]()
More Examples using pandas
DataSet: [Link]
[Link]?dl=0
1. Build a box-plot for the dataset. x-axis — Contract type, y-axis-
count
import pandas as pd
import [Link] as plt
%matplotlib inline
customer= pd.read_csv(r'[Link]')
grp=[Link].value_counts()
x=[Link]()
y=[Link](type([Link]))[Link](x,y,color ="orange")
[Link]("Distribution of Contract in dataset")
[Link]("Contract Type of Customer")
[Link]("count")
[Link]()
2. Build a Histogram. x-axis: Monthly Charges Incurred, y-axis:
count
3. Build scatter plot between TotalCharges(x-axis)vs Tenture(y-
axis).
NOTE: Kernel keeps hanging for scatter plot visualization, restart
and ensure too much data not used.
4. Build Box-plot .x-axis: Payment Method of Customer and y-axis:
Monthly Charges incurred. There are 3 ways of Payment: Electronic
Check, Mailed check and Bank transfer
Try it out yourself :)
Hint :
a=Customer[Customer['PaymentMethod']=='Electronic Check']
b=Customer[Customer['PaymentMethod']=='Mailed Check']
c=Customer[Customer['PaymentMethod']=='Bank transfer']