You are on page 1of 58

DIGITAL TALENT

SCHOLARSHIP
2019
14. Data Visualization with Python

digitalent.kominfo.go.id
Outline
Module 1 - Introduction to Module 3 - Specialized Module 5 - Creating Maps and
Visualization Tools Visualization Tools Visualizing Geospatial Data
• Introduction to Data Visualization • Pie Charts • Introduction to Folium
• Introduction to Matplotlib • Box Plots • Maps with Markers
• Basic Plotting with Matplotlib • Scatter Plots • Choropleth Maps
• Line Plots • Bubble Plots

Module 2 - Basic Visualization


Tools Module 4 - Advanced
Visualization Tools
• Area Plots
• Histograms • Waffle Charts
• Bar Charts • Word Clouds
• Seaborn and Regression Plots
Module 1 - Introduction to Visualization Tools
Introduction to Data Visualization
Python Packages For Data Vizualization
• Matplotlib – Python’s Visualization Library, berbasis MATLAB
• Docs: http://matplotlib.org/contents.html
• Plot.ly – Cloud Plotting Service menggunakan D3.js
• Jupyter: https://plot.ly/ipython-notebooks/
• Pandas / Cufflinks: https://plot.ly/pandas
• Folium – Python Wrapper for OpenStreetMap / Leaflet.js Overlays dan
Choropleths
• Docs: http://python-visualization.github.io/folium/
Module 1 - Introduction to Visualization Tools
Introduction to Data Visualization
Module 1 - Introduction to Visualization Tools
Introduction to Data Visualization
Benefits of Data Visualization
1. Quick, clear understanding of the information.
2. Identify emerging trends and act quickly based on what we see.
3. Identify relationships and patterns within digital assets.
4. Share our story (result) with others
5. Analysis at various levels of detail
Module 1 - Introduction to Visualization Tools
Introduction to Matplotlib
• Matplotlib is a desktop plotting package designed to make (generally
two-dimensional) plots
• This project was started by John Hunter in 2002 to activate an
interface like MATLAB in Python.
• Matplotlib supports various GUI backends on all operating systems
and can also export visualizations to all common vectors and raster
graphic formats (PDF, SVG, JPG, PNG, BMP, GIF, etc.).
Module 1 - Introduction to Visualization Tools
Basic Plotting with Matplotlib
import matplotlib.pyplot as plt
import numpy as np

data = np.arange(10)
plt.plot(data)
Module 1 - Introduction to Visualization Tools
Basic Plotting with Matplotlib
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
plt.plot(np.random.randn(50).cumsum(), 'k--')
ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))
Module 1 - Introduction to Visualization Tools
Basic Plotting with Matplotlib
• pyplot.subplots options :
Module 1 - Introduction to Visualization Tools
Line Plots
import numpy as np
import matplotlib.pyplot as plt

def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)


t2 = np.arange(0.0, 5.0, 0.02)

plt.figure(1)
plt.subplot(211)
plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')

plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), 'r--')
plt.show()
Module 1 - Introduction to Visualization Tools
Line Plots
Module 2 - Basic Visualization Tools
Area Plots
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, sharex=True)

ax1.fill_between(x, 0, y1)
ax1.set_ylabel('between y1 and 0')

ax2.fill_between(x, y1, 1)
ax2.set_ylabel('between y1 and 1')

ax3.fill_between(x, y1, y2)


ax3.set_ylabel('between y1 and y2')
ax3.set_xlabel('x')
Module 2 - Basic Visualization Tools
Area Plots
Module 2 - Basic Visualization Tools
Histograms
Generate data and plot a simple histogram
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter

# Fixing random state for reproducibility


np.random.seed(19680801)

N_points = 100000
n_bins = 20

# Generate a normal distribution, center at x=0 and y=5


x = np.random.randn(N_points)
y = .4 * x + np.random.randn(100000) + 5

fig, axs = plt.subplots(1, 2, sharey=True, tight_layout=True)

# We can set the number of bins with the `bins` kwarg


axs[0].hist(x, bins=n_bins)
axs[1].hist(y, bins=n_bins)
Module 2 - Basic Visualization Tools
Histograms
Update Histogram Color
fig, axs = plt.subplots(1, 2, tight_layout=True)

# N is the count in each bin, bins is the lower-limit of the bin


N, bins, patches = axs[0].hist(x, bins=n_bins)

# We'll color code by height, but you could use any scalar
fracs = N / N.max()

# we need to normalize the data to 0..1 for the full range of the colormap
norm = colors.Normalize(fracs.min(), fracs.max())

# Now, we'll loop through our objects and set the color of each accordingly
for thisfrac, thispatch in zip(fracs, patches):
color = plt.cm.viridis(norm(thisfrac))
thispatch.set_facecolor(color)

# We can also normalize our inputs by the total number of counts


axs[1].hist(x, bins=n_bins, density=True)

# Now we format the y-axis to display percentage


axs[1].yaxis.set_major_formatter(PercentFormatter(xmax=1))
Module 2 - Basic Visualization Tools
Histograms
Plot 2D Histogram
fig, ax = plt.subplots(tight_layout=True)
hist = ax.hist2d(x, y)
Module 2 - Basic Visualization Tools
Histograms
Customizing 2D Histogram
fig, axs = plt.subplots(3, 1, figsize=(5, 15), sharex=True,
sharey=True,
tight_layout=True)

# We can increase the number of bins on each axis


axs[0].hist2d(x, y, bins=40)

# As well as define normalization of the colors


axs[1].hist2d(x, y, bins=40, norm=colors.LogNorm())

# We can also define custom numbers of bins for each axis


axs[2].hist2d(x, y, bins=(80, 10), norm=colors.LogNorm())

plt.show()
Module 2 - Basic Visualization Tools
Bar Charts
Sample bar charts 1
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')


y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]

plt.bar(y_pos, performance, align='center', alpha=0.5)


plt.xticks(y_pos, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')

plt.show()
Module 2 - Basic Visualization Tools
Bar Charts
Sample bar charts 1
Module 2 - Basic Visualization Tools
Bar Charts
Sample bar charts 2
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')


y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]

plt.barh(y_pos, performance, align='center', alpha=0.5)


plt.yticks(y_pos, objects)
plt.xlabel('Usage')
plt.title('Programming language usage')

plt.show()
Module 2 - Basic Visualization Tools
Bar Charts
Sample bar charts 2
Module 2 - Basic Visualization Tools
Bar Charts
Sample bar charts 3
import numpy as np
import matplotlib.pyplot as plt rects2 = plt.bar(index + bar_width,
means_guido, bar_width,
# data to plot alpha=opacity,
n_groups = 4 color='g',
means_frank = (90, 55, 40, 65) label='Guido')
means_guido = (85, 62, 54, 20)
plt.xlabel('Person')
# create plot plt.ylabel('Scores')
fig, ax = plt.subplots() plt.title('Scores by person')
index = np.arange(n_groups) plt.xticks(index + bar_width, ('A',
bar_width = 0.35 'B', 'C', 'D'))
opacity = 0.8 plt.legend()

rects1 = plt.bar(index, means_frank, plt.tight_layout()


bar_width, alpha=opacity,color='b',
label='Frank')
Module 2 - Basic Visualization Tools
Bar Charts
Sample bar charts 3
Module 2 - Basic Visualization Tools
Bar Charts
More bar charts : See barchart_demo.py
Module 3 - Specialized Visualization Tools
Pie Charts
Sample Pie charts 1
import matplotlib.pyplot as plt

# Data to plot
labels = 'Python', 'C++', 'Ruby', 'Java'
sizes = [215, 130, 245, 210]
colors = ['gold', 'yellowgreen', 'lightcoral',
'lightskyblue']
explode = (0.1, 0, 0, 0) # explode 1st slice

# Plot
plt.pie(sizes, explode=explode, labels=labels,
colors=colors,
autopct='%1.1f%%', shadow=True, startangle=140)

plt.axis('equal')
plt.show()
Module 3 - Specialized Visualization Tools
Pie Charts
Sample Pie charts 1
Module 3 - Specialized Visualization Tools
Pie Charts
Sample Pie charts 2
import matplotlib.pyplot as plt

labels = ['Cookies', 'Jellybean', 'Milkshake',


'Cheesecake']
sizes = [38.4, 40.6, 20.7, 10.3]
colors = ['yellowgreen', 'gold', 'lightskyblue',
'lightcoral']
patches, texts = plt.pie(sizes, colors=colors,
shadow=True, startangle=90)
plt.legend(patches, labels, loc="best")
plt.axis('equal')
plt.tight_layout()
plt.show()
Module 3 - Specialized Visualization Tools
Pie Charts
Sample Pie charts 2
Module 3 - Specialized Visualization Tools
Box Plots
Sample Box Plots 1 : Seee boxplot1.py
Module 3 - Specialized Visualization Tools
Box Plots
Sample Box Plots 1 : Seee boxplot1.py
Module 3 - Specialized Visualization Tools
Box Plots
Sample Box Plots 2 : Seee boxplot2.py
Module 3 - Specialized Visualization Tools
Box Plots
Sample Box Plots 2 : Seee boxplot2.py
Module 3 - Specialized Visualization Tools
Scatter Plots
Sample Scatter Plots 1
import numpy as np
import matplotlib.pyplot as plt

# Create data
N = 500
x = np.random.rand(N)
y = np.random.rand(N)
colors = (0,0,0)
area = np.pi*3

# Plot
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.title('Scatter plot pythonspot.com')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
Module 3 - Specialized Visualization Tools
Scatter Plots
Sample Scatter Plots 1
Module 3 - Specialized Visualization Tools
Scatter Plots
Sample Scatter Plots 2
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker,
label="marker='{0}'".format(marker))

plt.legend(numpoints=1)
plt.xlim(0, 1.8);
Module 3 - Specialized Visualization Tools
Scatter Plots
Sample Scatter Plots 2
Module 3 - Specialized Visualization Tools
Bubble Plots
Sample Bubble Plots 1
import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility


np.random.seed(19680801)

N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii

plt.scatter(x, y, s=area, c=colors, alpha=0.5)


plt.show() numpy as np
Module 3 - Specialized Visualization Tools
Bubble Plots
Sample Bubble Plots 1
Module 3 - Specialized Visualization Tools
Bubble Plots
Sample Bubble Plots 2
from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T

plt.scatter(features[0], features[1], alpha=0.2,


s=100*features[3], c=iris.target,
cmap='viridis')

plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);
Module 3 - Specialized Visualization Tools
Bubble Plots
Sample Bubble Plots 2
Module 3 - Specialized Visualization Tools
Bubble Plots
Sample Bubble Plots 3 (in polar coordinates)
import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility


np.random.seed(19680801)

# Compute areas and colors


N = 150
r = 2 * np.random.rand(N)
theta = 2 * np.pi * np.random.rand(N)
area = 200 * r**2
colors = theta

fig = plt.figure()
ax = fig.add_subplot(111, projection='polar')
c = ax.scatter(theta, r, c=colors, s=area, cmap='hsv', alpha=0.75)
Module 3 - Specialized Visualization Tools
Bubble Plots
Sample Bubble Plots 3
Module 4 - Advanced Visualization Tools
Waffle Charts
Require pywaffle library
Installation : pip install pywaffle

Links : https://pypi.org/project/pywaffle/
Module 4 - Advanced Visualization Tools
Waffle Charts
Sample Waffle Plots 1
import matplotlib.pyplot as plt
from pywaffle import Waffle
# The values are rounded to 10 *
5 blocks
fig = plt.figure(
FigureClass=Waffle,
rows=5,
columns=10,
values=[48, 46, 3]
)
plt.show()
Module 4 - Advanced Visualization Tools
Waffle Charts
Sample Waffle Plots 2
import matplotlib.pyplot as plt
from pywaffle import Waffle
data = {'Democratic': 48, 'Republican': 46, 'Libertarian': 3}
fig = plt.figure(
FigureClass=Waffle,
rows=5,
values=data,
legend={'loc': 'upper left', 'bbox_to_anchor': (1.1, 1)}
)
plt.show()
Module 4 - Advanced Visualization Tools
Waffle Charts
Sample Waffle Plots 3
data = {'Democratic': 48, 'Republican': 46, 'Libertarian': 3}
fig = plt.figure(
FigureClass=Waffle,
rows=5,
values=data,
colors=("#983D3D", "#232066", "#DCB732"),
title={'label': 'Vote Percentage in 2016 US Presidential Election',
'loc': 'left'},
labels=["{0} ({1}%)".format(k, v) for k, v in data.items()],
legend={'loc': 'lower left', 'bbox_to_anchor': (0, -0.4), 'ncol':
len(data), 'framealpha': 0},
plot_direction='NW'
)
fig.gca().set_facecolor('#EEEEEE')
fig.set_facecolor('#EEEEEE')
plt.show()
Module 4 - Advanced Visualization Tools
Word Clouds
Require matplotlib, pandas, wordcloud, pillow

Instalation
pip install matplotlib
pip install pandas
pip install wordcloud
Pip install Pillow
Module 4 - Advanced Visualization Tools
Word Clouds
Sample Word Clouds1
# Libraries
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Create a list of word


text=("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas
Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot
Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut
Pie Time-Series Wordcloud Wordcloud Sankey Bubble")

# Create the wordcloud object


wordcloud = WordCloud(width=480, height=480, margin=0).generate(text)

# Display the generated image:


plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
Module 4 - Advanced Visualization Tools
Word Clouds
Sample Word Clouds1
Module 4 - Advanced Visualization Tools
Word Clouds
Sample Word Clouds2 (custom shape)
# Libraries
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image # to import the image

# Create a list of word (https://en.wikipedia.org/wiki/Data_visualization)


text=("Data visualization or data visualisation is viewed by many disciplines as a modern
equivalent of visual communication. It involves the creation and study of the visual
representation of data, meaning information that has been abstracted in some schematic form,
including attributes or variables for the units of information A primary goal of data
visualization is to communicate information clearly and efficiently via statistical graphics,
plots and information graphics. Numerical data may be encoded using dots, lines, or bars, to
visually communicate a quantitative message.[2] Effective visualization helps users analyze and
reason about data and evidence. It makes complex data more accessible, understandable and
usable. Users may have particular analytical tasks, such as making comparisons or understanding
causality, and the design principle of the graphic (i.e., showing comparisons or showing
causality) follows the task. Tables are generally used where users will look up a specific
measurement, while charts of various types are used to show patterns or relationships in the
data for one or more variables")
Module 4 - Advanced Visualization Tools
Word Clouds
Sample Word Clouds2 (custom shape)
# Load the image (http://python-graph-gallery.com/wp-content/uploads/wave.jpg)
wave_mask = np.array(Image.open( "wave.jpg"))

# Make the figure


wordcloud = WordCloud(mask=wave_mask).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
Module 4 - Advanced Visualization Tools
Seaborn and Regression Plots
Sample seaborn 1
import seaborn as sns; sns.set(color_codes=True)
tips = sns.load_dataset("tips")
ax = sns.regplot(x="total_bill", y="tip", data=tips)
Module 4 - Advanced Visualization Tools
Seaborn and Regression Plots
Sample seaborn 2
import pandas as pd
x, y = pd.Series(x, name="x_var"), pd.Series(y, name="y_var")
ax = sns.regplot(x=x, y=y, marker="+")
Module 5 - Creating Maps and Visualizing Geospatial Data
Introduction to Folium
folium.Map( location=[45.5236, -122.6750],
import folium m = folium.Map(location=[45.5236, -122.6750]) tiles='Stamen Toner', zoom_start=13 )
Module 5 - Creating Maps and Visualizing Geospatial Data
Introduction to Folium
More About Folium (Use Jupyter Notebook):
https://python-visualization.github.io/folium/quickstart.html#
https://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/
Module 5 - Creating Maps and Visualizing Geospatial Data
Introduction to Folium
More About Folium (Use Jupyter Notebook):
https://python-visualization.github.io/folium/quickstart.html#
https://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/

Hands On : Let’s Try the Folium Examples


Attribution :
IKUTI KAMI

digitalent.kominfo
digitalent.kominfo
DTS_kominfo
Digital Talent Scholarship 2019

Pusat Pengembangan Profesi dan Sertifikasi


Badan Penelitian dan Pengembangan SDM
Kementerian Komunikasi dan Informatika
Jl. Medan Merdeka Barat No. 9
(Gd. Belakang Lt. 4 - 5)
Jakarta Pusat, 10110

digitalent.kominfo.go.id

You might also like