You are on page 1of 19

Visualizing Data

May 24, 2021

1 Visualizing Data
A fundamental part of the data scientist’s toolkit is data visualization. Although it is very easy to
create visualizations, it’s much harder to produce good ones.

1.1 matplotlib
A wide variety of tools exists for visualizing data. We will be using the matplotlib library, which is
widely used. Matplotlib is a Python 2D plotting library which produces publication quality figures
in a variety of hardcopy formats and interactive environments across platforms
Checking Matplotlib Version
[ ]: import matplotlib

print(matplotlib.__version__)

3.4.2

1.1.1 Line Graph


Plotting a Simple Line Graph
[ ]: # Plotting a Simple Line Graph
import matplotlib.pyplot as plt
squares = [1, 4, 9, 16, 25]
plt.plot(squares)
plt.show()

1
Changing the Label Type and Graph Thickness
[ ]: # Changing the Label Type and Graph Thickness
import matplotlib.pyplot as plt

squares = [1, 4, 9, 16, 25]

plt.plot(squares, linewidth=3)

# Set chart title and label axes.


plt.title("Square Numbers", fontsize=16)
# loc = 'left'

plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

#Set size of tick labels.


plt.tick_params(axis='both', labelsize=14)
plt.show()

2
Correcting the Plot
[ ]: # Correcting the Plot
import matplotlib.pyplot as plt

input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]

plt.plot(input_values, squares, linewidth=3)

# Set chart title and label axes.


plt.title("Square Numbers", fontsize=16)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

#Set size of tick labels.


plt.tick_params(labelsize=16)
plt.show()

3
Markers
[ ]: # Markes
import matplotlib.pyplot as plt

input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]

plt.plot(input_values, squares, marker='8')


# change marker to - *, v, ^, d, D

# Set chart title and label axes.


plt.title("Square Numbers", fontsize=16)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

#Set size of tick labels.


plt.tick_params(axis='both', labelsize=12)
plt.show()

4
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line:
[ ]: # Linestyle
import matplotlib.pyplot as plt

squares = [1, 4, 9, 16, 25]

plt.plot(squares, marker='o', ls='-.')


# change it to 'dashed'

# Set chart title and label axes.


plt.title("Square Numbers", fontsize=16)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

#Set size of tick labels.


plt.tick_params(axis='both', labelsize=12)
plt.show()

5
• linestyle can be written as = ls
• dotted can be written as = :
• dashed can be written as = –

Style Or
‘solid’ (default) ‘-’
‘dotted’ ‘:’
‘dashed’ ‘–’
‘dashdot’ ‘-.’

Line Color
You can use the keyword argument color or the shorter c to set the color of the line:
[ ]: # refer above cell
plt.plot(squares, c = 'green')
# you can use c instead of color
plt.show()

6
Multiple Lines
You can plot as many lines as you like by simply adding more plt.plot() functions:

[ ]: import matplotlib.pyplot as plt

y1 = [3, 8, 1, 10]
y2 = [6, 2, 7, 11]
y3 = [5, 1, 12, 13]

plt.plot(y1, c = 'r')
plt.plot(y2, c = 'b')
plt.plot(y3)

plt.show()

7
Grid Lines
With Pyplot, you can use the grid() function to add grid lines to the plot.

[ ]: import matplotlib.pyplot as plt

subjects = ['Maths', 'Phy', 'Chem', 'CS']


squares = [83, 75, 89, 76]

plt.plot(subjects, squares, linewidth=3)

# Set chart title and label axes.


plt.title("Marks Details:", fontsize=16)
plt.xlabel("Subjects", fontsize=14)
plt.ylabel("Marks", fontsize=14)

#Set size of tick labels.


plt.tick_params(axis='both', labelsize=12)
plt.grid()

plt.show()

8
TRY IT YOURSELF
You have to change the attributes of grid
Matplotlib Subplots
Display Multiple Plots With the subplots() function you can draw multiple plots in one figure:

[ ]: import matplotlib.pyplot as plt

#plot 1:
x = [0, 1, 2, 3]
y = [3, 8, 1, 10]

plt.subplot(1, 3, 2)
plt.title("First")
plt.plot(x,y)

#plot 2:
x = [0, 1, 2, 3]
y = [10, 20, 30, 40]

plt.subplot(1, 3, 3)
plt.title("Second")
plt.plot(x,y)

9
#plot 3:
x = [4, 3, 2, 1]
y = [16, 9, 4, 1]

plt.subplot(1, 3, 1)
plt.title("THIRD")
plt.plot(x,y)

plt.suptitle("Two plots")
plt.show()

TRY IT YOURSELF
Subplot the two or more graphs horizontally

1.1.2 Scatter Plot


[ ]: import matplotlib.pyplot as plt
plt.scatter(2, 4)
plt.show()

10
[ ]: import matplotlib.pyplot as plt

plt.scatter(2, 4, s=200, marker='1')

# Set chart title and label axes.


plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

# Set size of tick labels.


plt.tick_params(axis='both', labelsize=14)

plt.show()

11
[ ]: import matplotlib.pyplot as plt

x_values = [1, 2, 3, 4, 5]
y_values = [1, 4, 9, 16, 25]

plt.scatter(x_values, y_values)

# Set chart title and label axes.


plt.title("Square Numbers", fontsize=16)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

# Set size of tick labels.


plt.tick_params(axis='both', labelsize=14)

plt.show()

12
Calculating Data Automatically
[ ]: import matplotlib.pyplot as plt

x_values = list(range(1, 801, 13))


y_values = [x**2 for x in x_values]

plt.scatter(x_values, y_values, s=20)

# Set the range for each axis.


plt.axis([0, 900, 0, 900000])
plt.show()

13
[ ]: # Compare Plots
import matplotlib.pyplot as plt

x1 = list(range(1, 30, 2))


y1 = [x*2 for x in x1]

x2 = list(range(0, 30, 2))


y2 = [x*3 for x in x2]

plt.scatter(x1, y1, s=20)

plt.scatter(x2, y2, s=20)

plt.show()

14
[ ]: import matplotlib.pyplot as plt

#day one, the age and speed of 13 cars:


x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y, color='r')

#day two, the age and speed of 15 cars:


x = [2,2,8,1,15,8,12,9,7,3,11,4,7,14,12]
y = [100,105,84,105,90,99,90,95,94,100,79,112,91,80,85]
plt.scatter(x, y, color='g')

plt.show()

15
Using a Colormap
[ ]: import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

colors = [0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100]

plt.scatter(x, y, c=colors, cmap='Greens')


plt.show()

TRY IT YOURSELF
Cubes: A number raised to the third power is a cube. Plot the first five cubic numbers, and then
plot the first 5000 cubic numbers. Using scatter plot with colormap

1.1.3 Bars
With Pyplot, you can use the bar() function to draw bar graphs:

[ ]: import matplotlib.pyplot as plt

x = ["MATHS", "SCIENCE", "ENGLISH", "HINDI"]


y = [49, 35, 41, 31]

plt.bar(x, y, color='g')

16
# You can use plt.yticks
#plt.yticks(range(0, 101, 10))
plt.show()

[ ]: plt.barh(x, y)
plt.show()

[ ]: plt.bar(x, y, width = 0.5)


plt.show()

1.1.4 Legends
Plot legends give meaning to a visualization.
[50]: import matplotlib.pyplot as plt
x = [x for x in range(0,11)]
y = [2*x for x in x]

plt.figure(dpi=150)
plt.plot(x, y, c='b', marker='^', label = '$2x$')

y2 = [x**2 for x in x]

plt.plot(x[0:3], y2[0:3], c='r')


plt.plot(x[2:], y2[2:], c='r', ls='--', marker='o', label='$x^2$')

plt.xlabel('X Axis')
plt.ylabel('Y Axis')

plt.xticks(range(0,11))
plt.yticks(range(0,101, 10))

plt.grid()

plt.title("Legend Example")
plt.legend(fancybox=True, framealpha=True, shadow=True, borderpad=1)

plt.show()

17
1.1.5 Pie Chart
[3]: import matplotlib.pyplot as plt

#plt.figure(dpi=100)

y = [35, 25, 25, 20, 15, 10, 4]

mylabels = ["A", "B", "C", "D", "E", "F", "G"]


#plt.pie(y, labels = mylabels, autopct='%.2f%%')

myexplode = [0.1, 0.1, 0.1, 0.1, 0.1, 0.3, 0.3]


plt.pie(y, labels = mylabels, explode = myexplode,
shadow = True, autopct='%.2f%%', counterclock=False,
radius = 2, startangle=90)

plt.title("Fruits", loc="left")
#plt.legend(title='Fruits',loc='upper left')
plt.show()

18
19

You might also like