You are on page 1of 16

4/29/2021 Which library should I use for my Python dashboard?

| by Antoine Hue | Towards Data Science

Get started Open in app

Follow 587K Followers

You have 1 free member-only story left this month. Sign up for Medium and get an extra one

Which library should I use for my Python


dashboard?
From the early prototypes in notebooks, up to production

Antoine Hue Aug 31, 2020 · 13 min read

When it comes to data visualization there are many possible tools Matplotlib, Plotly,
Bokeh… Which one is fitting my short term goals, within a notebook, and is a good
choice for longer-term, in production? What does production mean?

Now that you have a nice machine learning model, or you have completed some data
mining or analysis, you need to present and promote this amazing work. You may
initially reuse some notebooks to produce a few charts… but soon colleagues or clients
are requesting access to the data or are asking for other views or parameters. What
should you do? Which tools and libraries should you use? Is there a one fits all solution
for all stages of my work?

Data-visualization has a very wide scope, ranging from presenting data with simple
charts to be included in a report, to complex interactive dashboards. The first is
reachable to anybody that knows about Excel whereas the later is more a software
product that may require the full software development cycle and methodology.

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 1/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

In between these two extreme cases, data scientists face many choices that are not
Get started Open in app
trivial. This post is providing some questions that will come along this process, and some
tips and answers to these. The chosen starting point is Python within a Jupiter notebook,
the target is a Web dashboard in production.

To the target © Pixabax

Which library for the chart type do I want?


Getting the right chart type is always the first issue we are thinking about.

You have a great new idea for the data visualization, your boss is in love with Sunburst
graphs, but is this doable with the charting libraries you are using?

Mainstream open-source charting libraries in Python Matplotlib with Seaborn, Plotly,


Bokeh, support more or less the same set of chart types. They also support pixel matrices
that allow for extensions like displaying word clouds.

Here is a sample of drawing a word cloud from the Word Cloud Python library
documentation¹:
https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 2/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

import
Get started numpy as
Open np
in app
import matplotlib.pyplot as plt
from wordcloud import WordCloud

text = "square"
x, y = np.ogrid[:300, :300]

mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2


mask = 255 * mask.astype(int)

wc = WordCloud(background_color="white", repeat=True, mask=mask)


wc.generate(text)

plt.axis("off")
plt.imshow(wc, interpolation="bilinear")
plt.show()

Network graphs
Network graphs are a specific category that is not natively handled by the above-listed
libraries. The main Python library for networks is NetworkX. By default, NetworkX is
using Matplotlib as a backend for drawing². Graphviz (Pygraphviz) is the de facto
standard graph drawing libraries and can be coupled with NetworkX³. With quite a few
lines of code, you may also use Plotly to draw the graph⁴. The integration between
NetworkX and Bokeh is also documented⁵.

Geographic plots
Geographically located information and maps is also a specific subfield of data-
visualization. Plotting maps bears many challenges, among them:

Handling large and complex contours (e.g. country borders)

Showing more or less information depending on the zoom, this is also known as the
level of detail. The impact is at the same time on the readability of the plot, and on
the complexity that turns into latency to load and memory footprint

Handling of geographical coordinates, that is single or multiple projections from


non-euclidean spaces to the 2D euclidian space (e.g. latitude-longitude to UTM with
a given UTM zone)

Availability of information in multiple file formats, even if there are de facto


standards like GeoJSON and Shape-file
https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 3/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

Depending on the chosen plotting library, you may have to do little or a lot of pre-
Get started Open in app
processing. The common open-source library to deal with geographical coordinate is
Proj, and GDAL to deal with file formats and translation from formats or coordinates to
other contexts.

Matplotlib has no direct support for plotting maps, it relies on pixel matrics (a raster
image) as explained in the gallery⁶. But this is a pis-aller, you do not want to do that, but
if your only target is a static image.

Plotly is demonstrating map plots⁷ based on Mapbox. Many features are available but at
least one is missing, the management of the level of detail in scatter plots.

Bokeh has some support for maps including the integration of Google maps⁸, but it
seems quite crude.

Map with areas (contours), clusters of markers, hover information created with Folium © the author

Folium is a Python library wrapping the Leaflet Javascript library. Leaflet is used in
many collaborative and commercial websites, for example Open Street Map. Folium is
very effective to draw maps on Open data.

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 4/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

The open-source reference for geographic data manipulation is QGIS from OSGeo. It is a
Get started Open in app
desktop application but it includes a Python console and it is also possible to directly use
the Pyqgis API⁹.

Dataframes
Pandas and its Dataframe are must use for data science in Python. What is the impact on
chart creation and dashboard?

On the one hand, Pandas Dataframes and Series have a plot API. By default, it is relying
on Matplotlib as the backend. However, Plotly as the graphical backend for Pandas is
available¹⁰. Support for Bokeh is also available through a Github project¹¹.

On the other hand, the plots of Pandas might not fit with your requirements and you are
wondering how to use dataframes in plots besides using columns and series as vectors.
Plotly Express has this capability with support for column-oriented data¹².

Last resort, d3.js


If none of these libraries and their extensions are dealing with the chart you are looking
for, then you may switch to d3.js which is the base charting library for browsers. It
means that you would leave the Python world and enter Javascript’s domain.
Possibilities and customization are vast as shown in the example gallery¹³. However, you
will need to handle many aspects of the graph that are granted in other libraries: axes,
legend, interactivity…

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 5/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

Get started Open in app

Scatter matrix designed with Matplotly and Seaborn © the author

How can I build a view with multiple charts?


In the design of a dataviz, the layout or composition of charts come along with the
requirement for multiple charts to display several features. You probably already
enjoyed the pluses and minuses of Matplotlib’s subplots, starting with the quirky
imperative commands like “`plt.subplot(1, 2, 1)`” or even weirder equivalent
“`plt.subplot(121)`”. If this is enough to reach your goal, I would anyway suggest using
the alternate and cleaner “plt.subplots()” API that returns a figure and an array of axes.
You might anyway feel limited not only by the interactivity, this is dealt with in the next
section, but also limited in layout capabilities.

Enhancements to layouts
Matplotlib allows for uneven widths and heights using calls to Figure.add_subplots
method like “ fig.add_subplot(3, 1, (1, 2)) " making a subplot that spans the upper 2/3

of the figure¹⁴. Seaborn is introducing one enhancement which is the Scatter matrix¹⁵.

Plotly allows for similar capabilities including the uneven sub-plots. However, I find the
API rather limited. For example, it is not possible to set the font size of sub-plots title or

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 6/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

to share a legend. Bokeh has similar capabilities¹⁶.


Get started Open in app

Plotly, Express API, goes further with marginal probability plots as histograms or rug¹⁷,
and also a synchronized overview-detail chart that is called “range slider¹⁸”. This is
leading us to the interactivity of graphs that is detailed in the next section.

A way forward
But what if these layout helpers are not enough for my purpose? Possible answers are
many, ranging from the Plotly Dash solution to full HTML layout or SVG custom design
with d3.js.

Plotly Dash is proposing an intermediate solution in which you stick to Python but can
generate some more complex and more interactive dashboards than the plotting
libraries. Still, it requires that you have some basic HTML knowledge and sooner or later
will dive into the cascading stylesheets (CSS).

Dashboard layout with several plots and HTML elements using Dash, Plotly and Dash-Bootstrap-Components
© the author

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 7/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

Get started Open in app

How can I interact with my graph?


You are very pleased with the chart but it feels so static, there is not even a zoom!

Interactivity is so many different things. It starts with common operations like zoom and
pan. The next step is synchronized graphs: zoom and pan are applied simultaneously on
several charts that share an axis. You might also be interested in synchronous selection
on two graphs, also called brushing (example in Bokeh¹⁹). Matplolib has such
interactivity for all render engines but within a notebook²⁰. There is however a solution
based on Matplotlib, mpld3 is handling this and might provide all you need. However,
the trend is to use newer libraries like Plotly or Bokeh that have zoom and pan in
notebooks “out of the box”.

Then come dynamic annotations. They span from hover information when the mouse is
located on a marker to line plot highlights. Regarding hover, whatever the used library
(Matplotlib with Mpld3 and plugins, Plotly, Bokeh) it means attaching an HTML
document div to each marker, and probably also some CSS.

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 8/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

Get started Open in app

Bokeh plot of a UMAP projection of the MNIST digits (784 pixels) in the 2D feature plan with hover information
containing the original image © the author

More complex interactions are related to filtering or querying the data. It can be close to
the zoom function when the filter is modifying a range (e.g.: daily / weekly / monthly
selector for a time series), or a selector on the series of facets, or even more complex
associations. Selectors are available in Plotly as Custom Controls²¹ and in Bokeh as
widgets²².

The common plotting libraries provide basic capabilities for interactivity up to the
creation of some widget, but, as for advanced layouts, I would suggest to directly switch
to Plotly Dash which is more versatile.

Is the rendering of my dataviz dashboard fluid?


The more complex the dashboard or the larger the data, the longer it takes to process,
thus the longer to render. It may be ok to wait for a few seconds to get a view of a static
plot. It is no longer ok to wait more than 1 second when the graph is interactive with
widgets and controls.

If fluidity is gone, you have four main solutions, with increasing complexity:

Simplify the dashboard with fewer plots, fewer controls. That may be ok but then
you should think why such complexity was needed first?

Simplify the data, that shows less data or with less granularity. That may provide a
good tradeoff between features and accuracy. It is leading to the next solution…

Offline pre-processing of data to pre-assemble the data shown in the dashboard.


That probably means storing new series or new tables, leading eventually to other
https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 9/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

issues linked to data management. In the end, you will do a lot more data
Get started Open in app
engineering and will probably reach a dead-end with two much data, too many
tables. The solution to the dead-end is even more data engineering with…

Online processing in dedicated servers and the design of an API.

A server with an API is not the first thing you were thinking but you end up dealing with
this software project sooner than you think. It is also better to anticipate it than delay
until there is no other solution and project deadlines are coming fast.

Defining an API involves often several teams and skills: data scientists to define the why,
data engineers to define the what, and infrastructure engineers to define the how,
including performance, security, persistence, and integrity.

Plotly Dash allows for an initial API since it is based on the Flask framework. See the
Dash documentation on integration with a Flask app which is defining a custom Flask
route, that could serve data, i.e. an API²³. There is still no proper API access
management, including authentication. On that latter aspect, Dash is very limited²⁴.

While developing the dataviz, I feel like I am fast/slow/don’t know


How much effort will it take you to develop and publish?

Some tools are effective, they deliver the expected result, but are not efficient, getting to
the result takes a large amount of time. For example, d3.js is known as a very powerful
and complete data-visualization and charting library but at the same time, it requires
dealing with many things that are by default available in libraries with higher
abstraction.

Productivity is coming not only with using the right level of abstraction, but also an API
that is easy to use and well documented. I would say that none of the surveyed Python
charting libraries are easy to master.

Matplotlib’s API is quite complex when dealing with axes, formats (of labels), it is not
always consistent and quite redundant. As an example, see the above comment on
“`plt.subplot()`”. That’s not the only example, for example, there is a sub-routine
“`plt.xlabel()`” that is equivalent to the method “`ax.set_xlabel()`” on the Axes object.

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 10/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

Plotly’s API is not better, first, you must choose between two API sets: the Express set
Get started Open in app
that is quite simple but limited and mostly targeted at dataframes, and the Graphical
Object set that is more complete and complementing the Express set, but does not have
some nice high-level features that are in Express. Then you will have to deal with the
Plotly documentation that is, to me, really difficult to get through. Searching with either
the Plotly website internal or a Web search engine seldom leads you to the API you are
looking for. And you may have to deal with the documentation of the underneath data
model in JSON.

Bokeh API is probably leaner and better documented but has some weird things like two
separate instructions to plot a line chart and associated markers.

I really need a nice and slick web app, should I be afraid of it?
Your dashboard is successful and will be deployed as a product internally to your
organization, available to clients, or even directly exposed on the Internet.

As a data scientist, you are missing skills to handle that and get the help of software
specialists. However, you are asked what is the effort or scope of this development. This
highly depends on the path to production, which framework is used there, and on the
framework that you have been using until now.

Getting to production with (part of) current framework


Plotly standalone graphics can be exported as static HTML. Bokeh provides some
schemes to embed it²⁵. Matplotlib with Mpld3 has an HTML output²⁷. However, this
solution is targeting illustrations rather than dashboards.

Plotly Dash may in some cases go up to production as the main Web app or an embedded
widget²³. As said earlier in such setup you will need to inspect the security of your
system before jumping to online production. Regarding security, as a data designer, you
mainly need to check that you are exposing only the wanted data and not more.

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 11/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

Get started Open in app

Network and geographical interactive graph showing the incoming subway traffic in Paris © Data-Publica

Getting to production with a single page application


Today, most of the Web applications we use are based on a pattern called single page
application (SPA): the application is loaded once in the Web browser and then interacts
with the server through some asynchronous API without reloading the Web page. This is
what all of us now expect from a nice Web application.

SPA has two separate components: the browser side application in Javascript with a
framework like Angular or React.js, and the server-side application or service that may
get written on many frameworks and language: Java, C#, PHP, Javascript…, and even
Python.

Dash is already doing part of it. In fact, Dash is using one of the leading browser-side
framework, React.js, and on the server-side is based on Flask and Python. But as said
above you may reach some limits of Dash.

Besides the transition through Dash, Plotly and Bokeh have another advantage: they are
also available in Javascript as Plotly JS (and a React.js wrapper wrapper²⁶), Bokeh JS. In
fact, the Python version of Plotly is a wrapper around the Javascript. This implies that
given some plots or dashboards in Python based on Plotly or Dash or Bokeh, most of the
concepts and chart properties can be reused in the equivalent Javascript
implementation.

Conclusion
https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 12/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

In this post, we have brushed the path for a data-visualization dashboard from
Get started Open in app
experiments within notebooks, up to production. We have seen that the traditional
plotting library, Matplotlib, still has strong features and is usually the default backend
for specialized libraries like NetworkX and the Pandas Dataframe. But Matplolib is also
lagging on some aspects like integration and interactivity. Learning another framework
may be a good investment and will help you going forward up to production.

Two alternative frameworks are presented: Plotly and Bokeh. Both bring value as they
are more modern than Matplotlib. Both of them have a leading advantage when it comes
to bringing the dashboard to production: they are based on Javascript plotting
frameworks and most of the plots Python code can be translated directly in the
Javascript equivalent.

Plotly has another advantage on the go-to production path: it is integrated with Dash, a
framework to develop a simple dashboard as single-page applications while sticking to
Python. Required Javascript, including React components, and server API is generated
smoothly by Dash.

We have also seen that, as a data scientist or data-visualization designer, you should
anticipate requirements like interactivity, and their implications that may lead to the
development of an API to serve data.

In “Plotly Dash or React.js + Plotly.js”, a side by side comparison is showing what is


required to port a Dash application to a Javascript application in React.js with Plotly.js
for the dataviz, and served by a Web API in Python.

References

[1] Word clouds with Matplotlib and word_cloud library,


https://amueller.github.io/word_cloud/auto_examples/single_word.html

[2] NetworkX graph plot,


https://networkx.github.io/documentation/stable/auto_examples/index.html#bas
ic
https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 13/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

[3] NetworkX with GraphViz,


Get started Open in app
https://networkx.github.io/documentation/stable/auto_examples/pygraphviz/plot
_pygraphviz_draw.html

[4] Plotly with NetworkX, https://plotly.com/python/network-graphs/

[5] Bokeh and NetworkX,


https://docs.bokeh.org/en/latest/docs/user_guide/graph.html#networkx-
integration

[6] Matplotlib to draw maps, https://matplotlib.org/basemap/users/examples.html

[7] Plotly and maps https://plotly.com/python/maps/

[8] Bokeh and maps, https://docs.bokeh.org/en/latest/docs/user_guide/geo.html

[9] Pyqgis API, https://docs.qgis.org/3.10/fr/docs/pyqgis_developer_cookbook/

[10] Plotly as Pandas backend, https://plotly.com/python/pandas-backend/

[11] Bokeh as Pandas backend, https://github.com/PatrikHlobil/Pandas-Bokeh

[12] Plotly Express support for column-oriented data


https://plotly.com/python/wide-form/

[13] D3.js gallery https://observablehq.com/@d3/gallery

[14] Maplitlib documentation on figure layouts,


https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figur
e.Figure

[15] Seaborn scatter matrix,


https://seaborn.pydata.org/examples/scatterplot_matrix.html

[16] Bokeh layouts, https://docs.bokeh.org/en/latest/docs/user_guide/layout.html

[17] Plotly marginal plots, https://plotly.com/python/marginal-plots/

[18] Plotly’s range slider, https://plotly.com/python/time-series/#time-series-with-


range-slider

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 14/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science

[19] Bokeh brushing,


Get started Open in app
https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html#linked-
panning-and-brushing

[20] Matplotlib interactivity, https://matplotlib.org/users/interactive_guide.html)

[21] Plotly controls, https://plotly.com/python/#controls

[22] Bokeh widgets,


https://docs.bokeh.org/en/latest/docs/user_guide/interaction/widgets.html

[23] Plotly Dash integration, https://dash.plotly.com/integrating-dash

[24] Plotly Dash authentication, see https://dash.plotly.com/authentication

[25] Bokeh integration,


https://docs.bokeh.org/en/latest/docs/user_guide/embed.html#userguide-embed

[26] Plotly JS React wrapper, https://plotly.com/javascript/react/

[27] Mpl3D QuickStart, https://mpld3.github.io/quickstart.html

Sign up for The Variable


By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

You'll need to sign in or create an account to receive this


Get this newsletter
newsletter.

Data Science Data Visualization Plotly Bokeh Matplotlib

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf Ab t H l L l
15/16
4/29/2021 Which library should I use for my Python dashboard? | by Antoine Hue | Towards Data Science
About Help Legal
Get started Open in app
Get the Medium app

https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf 16/16

You might also like