Professional Documents
Culture Documents
org/24755
DETAILS
75 pages | 8.5 x 11 | PAPERBACK
ISBN 978-0-309-45848-1 | DOI 10.17226/24755
CONTRIBUTORS
Nathan Higgins, Ronald Basile, Samuel Van Hecke, Joseph Zissman, and Scott
Gilkeson; National Cooperative Highway Research Program; Transportation
BUY THIS BOOK Research Board; National Academies of Sciences, Engineering, and Medicine
Visit the National Academies Press at nap.edu and login or register to get:
– Access to free PDF downloads of thousands of publications
– 10% off the price of print publications
– Email or social media notifications of new titles related to your interests
– Special offers and discounts
All downloadable National Academies titles are free to be used for personal and/or non-commercial
academic use. Users may also freely post links to our titles on this website; non-commercial academic
users are encouraged to link to the version on this website rather than distribute a downloaded PDF
to ensure that all users are accessing the latest authoritative version of the work. All other uses require
written permission. (Request Permission)
This PDF is protected by copyright and owned by the National Academy of Sciences; unless otherwise
indicated, the National Academy of Sciences retains copyright to all materials in this PDF with all rights
reserved.
Data Visualization Methods for Transportation Agencies
NCHR RP Web-OOnly Doccument 2226:
Data Vis
sualizatio
on Method ansporta tion Agen
ds for Tra ncies
Naathan Higgins
Ronald
R Basile
Sam
muel Van Hecke
Copyright National Academy of Sciences. All rights reserved.
Jos
seph Zissman
ambridge Systematics, Inc. - Ca
Ca ambridge, MA
cott Gilkeson
Sc
Tak
koma Park, MD
NCHRP Project 08
8-36, Task 128
Submitted August 2016
ACKNOWEDGMENT
OPYRIGHT INFORMATION
CO
Coo
operative Research Programs (CRP) grants permission to reproduce material in n this publication for classroom and not--for-profit purposes. Permission is given
n with the
und
derstanding that nonne of the material will be used to imply TRB, AASHTO, FAA,, FHWA, FMCSA, FR RA, FTA, Office of thhe Assistant Secreta
ary for Research andd Technology,
PHMSA, or TDC endorrsement of a particular product, method, or practice. It is exppected that those rep producing the materrial in this document for educational and not-for-profit
es will give appropria
use ate acknowledgmentt of the source of any reprinted or reprod
duced material. Forr other uses of the mmaterial, request permmission from CRP.
DIS
SCLAIMER
The
e opinions and conclusions expressed or
o implied in this repo
ort are those of the researchers
r who perrformed the researchh. They are not nece
essarily those of the Transportation
Ressearch Board; the National Academies of
o Sciences, Enginee ering, and Medicine; or the program spoonsors.
The
e information contain
ned in this document was taken directly from the submission
n of the author(s). Th
his material has not been edited by TRB
B.
Data Visualization Methods for Transportation Agencies
Table of Contents
Chapter 1 · Introduction and Background ................................................................................................................................................ 1
1.1 · Visualization in Transportation........................................................................................................................................................... 1
1.2 · Background .................................................................................................................................................................................... 2
1.3 · Audience for the Guide .................................................................................................................................................................... 3
1.4 · Definitions ...................................................................................................................................................................................... 4
1.5 · Outline .......................................................................................................................................................................................... 4
Chapter 2 · How to Illustrate Data .......................................................................................................................................................... 5
2.1 · Common Chart Types ...................................................................................................................................................................... 5
2.2 · Other Recommended Chart Types................................................................................................................................................... 14
2.3 · Common Techniques ..................................................................................................................................................................... 15
Chapter 3 · Developing Effective Visualizations ....................................................................................................................................... 16
3.1 · Data Wrangling ............................................................................................................................................................................ 16
3.2 · Intent and Audience ....................................................................................................................................................................... 19
3.3 · Analysis ........................................................................................................................................................................................ 21
3.4 · Choosing a Strategy ...................................................................................................................................................................... 24
3.5 · Tools and Implementation .............................................................................................................................................................. 27
3.6 · Putting It All Together ..................................................................................................................................................................... 31
Chapter 4 · Style Guide ...................................................................................................................................................................... 32
4.1 · Basic Design Principles ................................................................................................................................................................... 32
4.2 · Font ............................................................................................................................................................................................. 32
4.3 · Color ........................................................................................................................................................................................... 32
4.4 · Federal Requirements for Style ........................................................................................................................................................ 34
Chapter 5 · Conclusion ...................................................................................................................................................................... 36
ii
iii
This Guide is intended to help transportation planners create modern data In some cases, these organizations can engage the time and skills of app
visualizations. It is built for planners who want to learn the basics and peek around developers, hobbyists, and academic researchers to display complex open
the corner at what is next once they have them mastered. It includes advice and datasets. For example, after the Massachusetts Bay Transportation Authority
best practices for developing visualization skills, enhancing transportation (MBTA) released time-bound subway location data, graduate students Michael
analysis, and improving public engagement. It considers advances in technology Barry and Brian Card produced “Visualizing MBTA Data,” shown in Figure 2.
and communication, such as online tools, software, and technical support
acquisition. The Guide is available in a website format at Figure 2: Screen Capture from “Visualizing MBTA Data” (by Michael Barry and Brian
http://vizguide.camsys.com; it focuses on key takeaways and examples. Card) http://mbtaviz.github.io/
Flows of funding from revenue sources to expenditures; Fernanda Viegas and Martin Wattenberg (creators of ManyEyes for IBM).
Wisdom from these subject matter experts appears throughout the Guide.
Maps of trip generation, rider origin and destination, transportation equity
(service level vs. income level), and roadway congestion, among many Key findings from our literature review and interviews include:
others; and
There are excellent visualization guides from private subject-matter experts
Comments and feedback collected at public meetings – see Figure 3.•
Figure 3: Graphic Recording (Champaign County Regional Planning Commission) and enthusiasts, presented as books and as blogs. For example:
Storytelling with Data – Cole Nussbaumer Knaflic
http://www.storytellingwithdata.com/; and
Evergreen Data - Stephanie Evergreen http://stephanieevergreen.com/.
The best visualizations communicate information with intent to a specific
audience. While sometimes charts are presented with labels, they are not
always. An example of this, from the British air traffic controller NATS, is
provided in Figure 4.
Figure 4: European Air Traffic over 24 Hours – GPS Locations of Aircraft (NATS)
Text intentionally left small to focus the reader on the overall image.
1.2 · Background
What’s the best stuff out there?
This Guide is informed by and founded upon a large-scale literature review and
interviews. The literature review focused on best practices drawn from our
professional experience and prior work, from online media (e.g., newspapers,
blogs, and magazines), and from academia. While we paid special attention to
work that touched the transportation industry, we also considered the literature
review to be an opportunity to introduce best practices from other fields to
transportation practitioners.
We conducted interviews with John Allen (New Jersey DOT and the CATT Lab),
With that said, one of the subject matter experts noted the growth of
Dan Howard (San Francisco Municipal Transportation Authority), and Ben “visualizing text” as a visualization task. Several survey respondents provided
Shneiderman (University of Maryland). We adapted interviews conducted by
best-practice examples of using color, size, typeface, and geometric
others of Mike Bostock (creator of D3.js), David McCandless (author of
arrangement, among other strategies, to help key numbers and words
Information is Beautiful), Tamara Munzner (University of British Columbia), and
“pop,” as in Figure 3;
There are many tools and types of tools for building best-practice Figure 5: Organizations that Responded to the Visualization Survey for
visualizations, each with a different learning curve. Common programs like Transportation Practitioners (not pictured: Hawaii Office of Planning)
Microsoft Excel can create effective charts. A new class of Business
Intelligence (BI) tools including Tableau, Qlik, and Microsoft’s PowerBI
brings sophisticated visualization power to experts and casual users alike.
Beyond these user-friendly tools, users with software programming
capabilities can obtain several free and open-source tools such as D3.js to
create interactive, web-based, data-driven visualizations; and
All subject-matter experts and best practices embrace simplicity as a driving
principle. One interviewee, Dan Howard, noted that:
“If it’s too complex for you to explain [to the lay reader], that’s a signal
that you’re in trouble.”
A dimension is an “attribute” of data (e.g., a column in a table) shown as a Chapter 5 · Conclusion: This chapter summarizes the Guide and offers
variation in the appearance of data points. Torsten Moller and Tamara advice and inspiration from visualization experts.
Munzner refer to these as “channels.” (Visualization Analysis and Design,
2014); and
A tool is a resource or software package used to build and publish
visualizations.
Choosing a chart type often means deciding among a set of familiar favorites.
Types of Data
Popular choices include map, bar/column, line/area, donut/pie, flow, treemap,
heat map, scatterplot, pictograph, and node-link diagrams. While it is important Geospatial data must have a dimension that associates it with a geographic area
to maintain fluency in your favorite charts in order to produce high-quality or location, such as:
visualizations, it also is important to know when a certain chart is not the right
choice to tell your story. Places and Political Entities – States, counties, ZIP codes or other
well-defined political or administrative areas;
See the Charts section of the web version of the Guide or Appendix A for a Pie – Bubbles are sized by value, divided by qualitative categories, and
selection of useful examples. located on a map.
Dot Density – Dots vary in density by value and are located on a map.
Geographic Maps Route – Lines representing transportation networks or paths between
locations are sized by value and located on a map.
Flow – Arcs showing flow from one location or node to another are sized
Maps are effective for communicating conditions in or
by value and located on a map.
differences among specific geographic areas. Geographic
boundaries (e.g., states or counties) provide the foundation Area Cartogram – Map areas are distorted by value while keeping the
for many maps. geography recognizable.
Be aware of the disproportionate effect of sparsely populated areas when Custom Whole Numbers – Observations are assigned to custom
using choropleth maps. These colored maps are eye-catching and familiar, sub-ranges bounded by whole numbers. All the other class break
and generally well-understood, but can be misleading because choices tend to give odd-looking ranges like “12.3 to 15.7,” while
tightly-packed, densely populated areas like inner cities may not even viewers may be more comfortable with “10 to 15.” If the map is updated
appear at the scale being displayed, while large rural areas will dominate with new data, you will need to re-evaluate the suitability of the chosen
the visual field, even though they may represent few people. Interactive breaks.
zooming can make the small areas visible, but the larger areas will always Associate the scale of light and dark colors with a scale in your data.
dominate the perceived coloration. Consider using a bar graph, which gives
equal weight to each area, or a treemap, which can size areas by population Use different color scales for positive and negative values (e.g., red for
rather than geographic size (albeit at the expense of easy spatial positive and blue for negative). Chapter 4 addresses the use of color and
recognition). Another option is an area cartogram, which distorts the size of ColorBrewer.org is a useful resource for selecting them.
the area on the map while maintaining adjacency or relative position, to
make the size proportional to the item of interest. Tools
Improve memory and comprehension All Types of Maps
Limit the information to just what your audience needs to understand your Map Tools: Esri ArcGIS/ArcMap, QGIS; and
point. This is true for any visualization but especially so for maps. It can be General Tools: Google Fusion Tables.
tempting to include interstate shields, urbanized area boundaries, north
arrows, and scales when they don’t add any information to the visual. Basic Maps (Choropleth and Bubble)
If interactive, make layers of information available via a layer menu because Visualization Environments: Tableau, Qlik, Microsoft Power BI; and
only parts of your audience will want to see them. For Developers: D3.js, R, Google Maps application program interface
Limit yourself to five or fewer classes, particularly for choropleth maps. There (API), Leaflet.
are various ways to determine the break points for these sub-ranges
(sometimes called ‘class breaks’):
Quantiles – An equal number of observations are assigned into each
sub-range. Each color will appear the same number of times on the map
but the number of observation in each sub-range may be large or small.
Equal interval – Observations are assigned to some number of
equal-sized sub-ranges. Some colors may not appear at all while others
may appear frequently. Outliers can have a strong influence.
Natural Breaks – Observations are assigned to some number of
sub-ranges based on how they cluster. This generally results in an
attractive map.
Bar Charts this chart often is used to show population by age group with bars for
males extending to the left and females extending to the right.
Radial – Bar arranged in a circle or spiral, rather than extending in parallel
Bar charts are useful for comparing quantities across one from a baseline.
or more dimensions. The length of the bars represent the
relative magnitude of attributes (e.g., vehicle miles traveled Tips
by mode).
Tell the same story as your data
Types of Data Start from zero – Starting anywhere else distorts variation. If your data does
not show much difference at full scale, that may be the story. If starting from
Bar charts require at least one quantitative dimension, which corresponds to the zero is not an option, you might consider using a logarithmic scale or
length of the bars, and one qualitative dimension, which is represented by providing a zoom feature to show differences more clearly.
different bars. Some variations can represent multiple quantitative and qualitative
dimensions. Use stacked bars with care – Stacked bars are not good for estimating
percentages or comparing components because only the first bars line up.
Quantitative – Bar charts are suitable for comparing nearly any quantitative data. Radial charts are visually appealing but make it difficult to compare values
- They rarely are the best choice.
Qualitative – Bars must represent discrete categories.
Improve memory and comprehension
Variations
Use five to eight bars - More bars make it hard to compare specific values.
Horizontal/Vertical – Bars extend horizontally or vertically (sometimes If you find yourself needing more bars, consider using a line chart instead.
called column charts).
Sort the bars – to make it easier to compare bars that are similar in height.
Clustered – Bars are grouped to show differences among categories of the However, be aware that sorting implies a ranking.
data, with color representing different dimensions.
Horizontal layouts accommodate longer category names – Labeling charts
Stacked – Multiple bars are stacked on top of the other. The bars can be is important and horizontal bar charts give you more labeling real estate;
normalized so that bars represent percentages of the whole.
Diverging – Bars extend in positive and negative directions from the
baseline to represent positive and negative values.
Bullet – Bars are overlaid on a background to compare quantitative
measures (e.g. condition) against qualitative ranges (e.g. poor,
satisfactory, and good). Markers are placed on the bar to indicate targets.
Histogram – Bars show the number of elements in each category. If the
measure is continuous, observations are grouped into ranges.
Pyramid – Bars diverge to the left and right from a centerline showing the
number of elements in each category. Often called a population pyramid,
For Developers: D3.js, R, Google Chart API. Area Graph – Straight lines are drawn to connect data points and the area
under the line is colored.
Other Bar Types (Bullet, Histogram, Pyramid, Radial)
Stacked Area Graph – Multiple area graphs are stacked on top of the
General Tools: Microsoft Excel (histogram starting with 2016); other. The areas can be normalized so that they represent percentages of
Visualization Environments: Tableau, Microsoft Power BI (with custom the whole.
visuals); and Streamgraph – A stacked area graph centered on an axis to create a
For Developers: D3.js, R, Google Chart API. flowing shape.
Tips
Line Graphs
Tell the same story as your data
Start scale at zero to put your data into proper context. Focusing on a narrow
Line graphs are useful for showing trends and comparing range can make changes appear more dramatic than they really are. If
them among variables. Line graphs show changes in a starting from zero is not an option, use a logarithmic scale or provide a zoom
quantitative variable across some other ordered variable, feature to see differences.
usually time (e.g., increase/decrease in revenue over time).
Use two y-axes to plot dimensions with different scales at the same time –
When necessary, plot variables with different ranges together to compare
trends. Use two y-axes (usually placed on opposite sides of the graph).
Types of Data Use stacked areas with care – Stacked areas are not good for estimating
Values (y-axis) – Line graphs are suitable for plotting one or more quantitative percentages or comparing areas because only the first components line up.
variables on the same chart; and Improve memory and comprehension
Period or Span (x-axis) – Line graphs typically show change over time. If the Label lines directly – Label the lines directly on the graph to make them easier
variable is not time, it should have a logical order so that moving from the left to read.
to right has some meaning.
Highlight key events – Use arrows and callouts or shade background to
Variations annotate significant events.
Segmented Line – Straight lines are drawn to connect data points. If the chart is interactive, show points – In an interactive visualization, points
add a visual cue for users to mouse over them to see more information.
If the chart is static, avoid showing points - A line without points looks sleek A search of the web turns up many articles discouraging the use of pie charts.
and uncluttered. One example, from Business Insider blogger Walter Hickey: “The pie chart is
easily the worst way to convey information ever developed in the history of data
Use line thickness to make a statement – Thicker lines make a bold statement
visualization.” (http://www.businessinsider.com/pie-charts-are-the-worst-2013-
and are easy to see. Do not use thick lines if they obscure each other.
6 – accessed 2016). Such opinions generally cite Edward Tufte (The Visual
Show grid lines but make them nearly invisible – Show grid lines or reference Display of Quantitative Information, 2001) and/or Stephen Few
lines to help people estimate the values but make them as much a part of (http://www.perceptualedge.com/articles/visual_business_intelligence/save_the
the background as possible. _pies_for_dessert.pdf), two well-known data visualization writers.
Use smoothed or regression lines to reduce the visual impact of variable Still, pie charts are effective when used correctly. At a glance, the audience can
data and maintain the overall shape of the line. see divisions of a whole and discern high-level proportion (i.e., quarter or half).
Watch for obscured lines – The maximum number of comprehensible lines They also can sum adjacent slices (i.e., the two largest segments account for
may depend on how close together they are and how they look. If there is more than half of all cases).
little difference among them, one line may obscure another. Use style to help
differentiate among the lines. Types of Data
Parts of a Whole/Percentages – Use when there are fewer than five to eight slices
Tools and when the sum of all slices is exactly 100 percent.
Basic Line and Area (* can also produce streamgraphs)
General Tools: Microsoft Excel, Google Sheets;
Variations
Pie – A circle is divided into segments (arcs) by value with each segment
Visualization Environments: Tableau, Qlik, Microsoft Power BI (using
representing one portion of the whole.
custom visuals*); and
Donut – The center is removed from a standard pie chart, leaving a ring
For Developers: D3.js*, R*, Google Chart API.
of arcs. The blank center area can be used for labeling or other purposes.
Pie Charts Multi-Tier (Sunburst) – Multi-tier pie or donut charts add one or more rings
around the original chart, with each segment of the outer ring further
subdivided to show hierarchies in data. Major segments in the outer rings
must align with their inner counterparts, although some outer segments
Pie charts give a general impression of the relative may be missing.
contributions of each part to a whole (e.g., the percent of
congestion caused by different things). They show each Tips
portion as a slice of a circular pie.
Tell the same story as your data
Use only when data represent a whole - This also goes for normalized
Note: Some data visualization experts discount pie charts because humans are stacked area or bar charts, but is particularly important for pie charts.
not good at recognizing slight differences in angles. This means that humans
have a hard time comparing slices accurately (bar charts often are better at this). Consider a bar chart – If the intent of the chart is to compare values, bar
charts are better suited.
Avoid one very small slice –Small slivers are hard to distinguish and hard to Sankey Diagram – Shapes are connected by arrows sized relative to value.
label. If it makes sense for your data, combine a number of the smallest Typically, Sankey diagrams show the flow of energy or money through a
segments to create an ‘other’ category. process.
Pie and Donut charts (asterisk = can also produce sunburst charts) Make sure that your flows match – A Sankey diagram requires that you
balance the outgo and the income for each node. .
General Tools: Microsoft Excel (*starting with 2016), Google Sheets;
Improve memory and comprehension
Visualization Environments: Tableau*, Qlik, Microsoft Power BI*; and
For Developers: D3.js*, R*, Google Chart API. Minimize overlapping flows to make it easier for the audience to understand
your intent. Complex flows may mean lots of lines crossing; experiment with
various layouts to minimize that. Interactivity can help by highlighting one
Flow Charts pathway on mouseover.
Quantitative values associated with starting and ending points or states - To plot EventFlow/LifeFlow is part of an ongoing research program at the University
a flow, you need starting points, each with one or more ending points, and a of Maryland, and is available for commercial licensing as well as non-
measure of flow between them, such as volume. You can display categorical commercial use.
data as well, typically by coloring the point or the path. For example, you can
plot the flow of freight tons from origin to destination seaports, distinguishing
type of freight.
10
Heat Maps attributes or other hot spots. Green and blue are perceived as “cool,”
making them good for showing negative values or other cool spots. Chapter
4 describes general use of color.
Quantitative values in an ordered field – Heat maps depend on assigning colors For Developers: D3.js, R, Google Chart API, Leaflet.
to a range of values, and the values must have a logical order such that
adjacency is meaningful. For example, heat maps can show the concentration Scatterplots
of crashes by the days on a calendar, of where a user’s eyes look on a webpage,
or of where jobs are located.
Scatterplots are effective at showing how two variables
relate to each other. A scatterplot displays values for two
Variations variables on a grid. The data are displayed as points,
Cluster Heat Map – A shaded/colored matrix, with rows and columns positioned according to the value of one variable on the
arranged to highlight a relationship (e.g., number of trips by origin and x-axis and the other on the y-axis. Unlike line charts, the
destination of travel). x-axis does not require any logical order.
Use red and yellow for hot and green and blue for cool – Red and yellow
are perceived as “hot” colors making them good for showing positive
11
Scatterplot – Points are placed on a graph based on two values. An General Tools: Gapminder, Google Charts; and
algorithm can be used to fit a line that passes through the points. A For Developers: D3.js, R, Google Chart API.
regression line will show the overall trend.
Bubble Chart – Points are placed on a graph based on two values and Pictographs
sized based on a third.
Motion Chart – Points are placed on a graph based on two values, sized
based on a third, and put into motion based on a fourth (typically time). Pictographs are useful for making simple data more
approachable and memorable. They use graphic symbols
Tips (e.g., a figure of a person to represent people) to depict
data.
Tell the same story as your data
Size bubbles based on area, not diameter – When given the option, correlate
bubble area (not diameter) to your values.
Types of Data
Improve memory and comprehension Single Quantitative Measure – Pictographs show how many there are of
something.
Consider adding reference lines - Draw attention to a certain category of
values by using lines to highlight the median, average, or target value on
each axis. This is useful when assessing risk or project priorities. Variations
Use transparency to help with overlapping data points – Overlapping dots Dot Matrix Diagram or Icon Array – Graphic symbols are laid out in a grid
get darker, suggesting clusters of data. and colored to denote the group to which they belong. The entire grid
represents a denominator and the colored group a numerator (for
The individual dot is not as important as the general shape - The many points example, seven out of a hundred people). A symbol can be partially
on a scatterplot can be close enough to appear as a mass or line. colored to represent a fractional part.
Tools Symbol Bar Chart – Icons are arranged in a row or column to resemble a
bar chart. Like a bar chart, length (determined by number of icons)
Scatter and Bubble Plots represents magnitude.
General Tools: Microsoft Excel, Google Sheets;
Tips
Visualization Environments: Tableau, Qlik, Microsoft Power BI;
Tell the same story as your data
For Developers: D3.js, R, Google Chart API.
Use natural frequencies – Viewers understand ‘x of 100’ (or 10) better than
numeric percentages.
12
Beware of volume distortion – if using icon size to show value, correlate to Types of Data
volume rather than height or width; use only one shape.
Hierarchical quantitative data - Treemaps represent quantitative data by dividing
Improve memory and comprehension a space into areas relative to the quantity. They show hierarchy by nesting areas
within larger areas. An additional quantitative or categorical dimension can by
Choose meaningful icons – Using emotional rather than abstract imagery represented by coloring the areas.
(e.g., outlines of humans vs. circles) can increase interest and attract viewers.
Place the icons next to each other for greater impact. Don’t distribute the Variations
numerator icons over the entire array unless the point you are making is the
randomness of these occurrences. When showing two icon arrays, use the Treemap – Rectangles are sized relative to the value and organized in an
same denominator, to enable effective comparison. alternating vertical and horizontal pattern or by category and packed into
larger rectangles.
Tools Circle Packing – Bubbles are sized relative to the value and organized by
category and packed into larger circles.
All Types
Note: Multi-tier pie charts (also called sunburst charts) display hierarchical
General Tools: Microsoft Excel (with effort), PowerPoint, or Visio; Adobe as well. They are discussed under Pie Charts.
Illustrator or Photoshop;
Visualization Environments: Tableau, Qlik, Microsoft Power BI (all with Tips
effort); and
Tell the same story as your data
For Developers: D3.js, R, Google Chart API.
Consider whether your nodes belong in a hierarchy – If you don’t have
hierarchical data, consider using a bar chart.
13
Watch for overlapping arcs – Too many overlapping arcs can make them
difficult to understand. Play with the layout to highlight your intent.
Node-Link Diagrams, also known as network graphs,
Minimize the number of nodes - Arc and chord diagrams can be difficult to
show entities and their relationships. Generally, entities
grasp, but you can help viewers by using them only for data with a limited
are expressed as nodes (dots), and relationships (or
number of nodes and providing a way to highlight specific chords, arcs, or
edges) as links (lines).
edges.
Tools
Types of Data All node-link diagrams
A finite set of nodes, each representing an entity – Nodes may have a quantitative General Tools: Microsoft Excel (with NodeXL add-on, Node-link only);
value, which can be expressed by the size of the node. Nodes can have
characteristics represented by color and size. An edge connects nodes to other Visualization Environments: Tableau (with effort), Qlik (with D3.js
nodes. Edges also can have size and color characteristics. extension), Google Fusion Tables, Microsoft Power BI (with custom
visuals); and
Variations For Developers: D3.js and R.
Arc Diagram – Nodes are placed along an axis with arcs connecting them.
The arc’s lines can be colored or made thicker relative to the frequency of
the connection.
2.2 · Other Recommended Chart Types
These additional chart types are fairly common and may be a good choice for
Tree Diagram – Boxes or nodes connected in a hierarchy and relationships.
particular visualizations. This list is not comprehensive, however, as many unique
The classic organizational chart. They can start with a node at the top or
chart types exist and analysts are constantly developing more.
bottom.
Chord Diagram – Nodes are placed along a circle with arcs connecting Tables – One of the most common ways of presenting
them. The arc’s lines can be colored or made thicker relative to the numbers is in a table, where rows and columns represent
frequency of the connection. some meaningful concept. Tables can be the best way to
Force-Directed Graph – Nodes are placed such that connecting edges are present a lot of numbers if the important take-away is the
about the same length and have as few crossings as possible. number itself. Visual elements, such as color or small
graphs (called Sparklines in Excel) can be added to a table
for emphasis or to facilitate comparison.
Tips
Tell the same story as your data
14
15
Consider each step in the process of developing an effective visualization in order Discover the content and patterns in your data. “Sketching” your data can
to imbue the finished product with focus and meaning. First, you must acquire provide a positive feedback loop. Illustrating a dataset can make outliers
and refine a dataset – a process called “data wrangling” in the visualization and patterns in the data obvious where a spreadsheet might hide them, and
community – analyze the data, and identify patterns and findings that you can it simplifies a key thought process – are the outliers mistakes or do they point
call out visually. To hone your message, identify your intent and audience – who to a real phenomenon?;
needs to know about your data, what do you want them to think and do about it?
Structure the data to have only the needed attributes, named and formatted
For example, do you want them to change their daily behavior or try to change a
in a way that maximizes comprehension;
law?
Clean the data to eliminate meaningless or undesirable outliers (i.e., null
With data in hand and clear intent, identify and execute a strategy, using values reported as 0 or 99);
appropriate charts and communicating with clarity. Finally, use the best tools to
implement and share the project effectively within your organization’s practice. Enrich the data with relevant additions that illuminate trends or provide
necessary context; and
3.1 · Data Wrangling Validate the prior steps by, at a minimum, assessing whether each attribute
is formatted properly and falls within logical constraints (e.g., percentage
Find your data and make it yours sums to 100).
Before you begin visualizing data, you must find, acquire, and prepare it. Analysis
and visualization require accurate data that are well-structured for your task. The Volume of Data
process of transitioning raw data inputs into presentable data sets has come to
be called “data wrangling.” It is equally realistic that a transportation practitioner could seek to display a
single data point as it is that she may wish to portray millions. For example, a
Martin Wattenberg and Fernanda Viegas – cofounders of the IBM ManyEyes report to the residents of a town might wish to convey the 0-9 NBI rating of a
project – note that it is important to work with real rather than mocked-up data, local bridge.
since manufactured data will rarely contain the nuances of the real thing.
Wattenberg compares working with real data to getting feedback from real This one data point can be placed in context (e.g., a bar chart scaled from 0-9),
people. Information is Beautiful creator David McCandless observes that translated (e.g., / ), or illustrated (e.g., diagrams or photographs showing
sometimes the data may seem boring, and in these cases the practitioner may the damage that drives the rating). On the other hand, the same agency may
be able to find additional data to normalize, compare, or merge, or the boredom wish to convey a dozen condition metrics on thousands of bridges through a
might be a cue to ask deeper questions. single visualization. Methods that would make little sense for the single data
point, such as geographic search functionality, mouse-over information
Jeffrey Heer, co-creator of the Trifacta data-wrangling tool, has cited survey windows, and animation, become sensible for larger datasets.
results showing that between 50 and 80 percent of productive time spent by
industry data analysts is for formatting and integration. His team uses a process Volume of data is closely tied to enrichment – you may need to add additional
for data wrangling that includes the following tasks: data to provide context and visual interest when you have a small dataset. For
16
example, West Virginia DOT visualized four alternatives for replacing the Dick Figure 7: 3D Model for the Mobile River Bridge (ALDOT)
Henderson Memorial Bridge, as shown in Figure 6. https://informedinfrastructure.com/18532/building-a-blockbuster-bridge/
Figure 6: Build Alternatives Comparison for the Dick Henderson Bridge (WVDOT)
Acquiring Data
Data may be available in-house, but rarely are they already clean and in the
ideal format. If data are acquired from a vendor, the format may be negotiable,
Text intentionally left small to focus the reader on the overall image.
but adapting them for the chosen visualization platform may still take some effort.
Our survey respondents often visualized in-house data, but also often augmented
With the larger data set, the alternatives (i.e., the data points) provided a full them with free data from several common sources, including:
context for the data and fulfilled the designer’s intent – to compare the cost,
closure time, and maximum grade of each design while also demonstrating the Highway Performance Monitoring System (HPMS) – HPMS is an information
aesthetics of each. system maintained by the Federal Highway Administration (FHWA) that is
built from required annual submissions by DOTs. Statistics including
By contrast, once Alabama DOT selected an alternative for the Mobile River mileage, pavement condition, traffic volume, and functional classification
Bridge, its intent became selling the project to neighbors by demonstrating the can be found at http://www.fhwa.dot.gov/policyinformation/statistics.cfm.
visual impact of the structure. To do this, the Visualization Team enriched the
It is important to note that the website can be challenging to navigate and
bridge model with four square miles of Downtown Mobile, to allow residents to
does not function with all browsers – we recommend Microsoft Internet
“see” the bridge from their doorstep. Figure 7 portrays an overview.
Explorer;
National Bridge Inventory (NBI) – Like HPMS, NBI is compiled from annual
DOT submissions to FHWA. For each bridge of sufficient size, states are
required to provide physical characteristics (e.g., type, length, height), as
17
well as the results of an annual inspection and condition assessment. The (The video is a production of WGBH Educational Foundation © 2015)
data can be downloaded at https://www.fhwa.dot.gov/bridge/nbi/ascii.cfm.
A delimited format is available behind the link for each year, for easy upload Key points of interest, with time stamps of their locations in the video, include:
into Microsoft Excel or any other data wrangling and analysis tool;
Brief Overview of the Product: 1:52 – Card demonstrated an animated map
US Census and the American Community Survey (ACS) – The US Census of train positions, a static line chart (with time and space on the axes) with
takes place every 10 years and participation is compulsory for all US annotations for important events, a heatmap of station entries and exits
residents. To fill in the intervening years, the Census Bureau completes the against time, and a scatterplot of overall transit times (including on-train
ACS annually using a sample of households in each census tract (approx. travel and wait times) for each pair of stations over the course of the day;
3.5 million individuals per year). Generally, the ACS will be combined over
a 3- or 5-year period. The demographic and employment data (including Research: 4:10 – The team discussed which elements of the MBTA would be
vehicle ownership and commute mode choice) from the Census and ACS interesting for users and the public to experience visually. Card emphasized
are available through American FactFinder at http://factfinder.census.gov/; the importance of identifying objectives in advance, because “once you have
and a dataset, you start thinking in terms of what’s easy to do with that data,
instead of what’s important.” Barry and Card chose to focus on locating
GIS Sources –Where an agency standard basemap does not exist, Esri congestion and delay, illustrating the impact of large events and snowstorms,
provides options to customers of its ArcGIS package, while OpenStreetMap and giving each user a takeaway about his own commute;
relies on a worldwide network of contributors to provide a basemap for free
with attribution. The primary means of accessing all of these alternatives is Brainstorming: 8:00 – Barry and Card brainstormed illustrations iteratively
through a GIS tool. Beyond basemaps, most states maintain free GIS by sketching them on paper and uploading them to Google Docs for
datasets and formatted layers for public use that are easily found through an comment;
online search, as do Federal Agencies such as the US Geological Survey Data Acquisition: 9:10 – A snapshot of train positions is publicly available
(USGS) and the US Census Bureau Topologically Integrated Geographic from the MBTA. Barry and Card periodically downloaded the snapshots to
Encoding and Referencing (TIGER). form a month-long dataset. Each member of the team added a redundant
Data may also be acquired through a “data scraper,” a procedural routine – set of records each minute. Merging the datasets resolved missing records,
typically based online – that extracts data from websites and documents to as shown in Figure 8;
convert it into a tabular format (e.g., www.import.io). The practitioner can use Figure 8: Use of Redundant Datasets in “Visualizing MBTA Data” (Brian Card)
these tools to collect and store a live feed over an extended period of time—
either for retroactive analysis or to develop a visualization of live data.
18
Data Wrangling Tools: 14:40 – Barry and Card used node.js (a JavaScript Times, recommends that you “learn to sketch with data,” by which she means
library) for processing their files in the JSON format. The visualizations creating rapid, low-fidelity sketches of various visualizations to identify patterns
themselves are built in D3.js, and the code is stored in BitBucket because and findings that will interest your intended audience. This way of designing
GitHub (a more commonly-used competitor) makes all draft code public; allows you to put tangible products in front of people for discussion.
and
Iteration: 16:10 – “Not all of the ideas that look good on paper look good Intent
with real data… we had 6,000 JSON files and no idea what our dataset
looked like. The only way that we could look at it was by building “Intent” is the question you want to answer or the outcome you want to
visualizations.” Barry and Card built many draft visualizations and tested encourage. Generally speaking, a visualization will convey a fact or an argument
them against their objectives. If an attempt did not tell their intended story, about a topic. For transportation practitioners, frequent topics include proposed
Barry and Card not only tried again, they attempted to identify elements of projects, assets (e.g., bridges, roadways, bike lanes), the traveling public, and
the failed attempts that were interesting and could inform future attempts. budgets. In many cases, the transportation practitioner must assume that the
audience’s entire understanding of a topic will be driven by a particular
Barry takes over the presentation at 17:00 and describes the team’s chart type illustration.
and stylistic choices. We will pick up the description in Section 3.4.
Being firm in your objectives can be a help you build a focused visual. In a blog
post entitled “visualizing opportunity,” visualization author Cole Nussbaumer-
3.2 · Intent and Audience Knaflic demonstrates how a focus on communication leads from a formatted
What’s your story, and who needs to hear it? table to a more intuitive view of key characteristics and elements within the
dataset. We summarize here process here, beginning with Figure 9.
Conceptualizing and planning a visualization project is about telling a story, so
you can frame it around your intent and audience: Figure 9: Initial Formatted Table, “visualizing opportunity” (Cole Nussbaumer-
Knaflic)
Intent is the “nugget of truth” that a visualization must make obvious. This http://www.storytellingwithdata.com/blog/2015/9/16/visualizing-opportunity
visualization may be the only thing your audience knows about this topic.
What do you want that to be, and what do you want them to do as a result?
Your audience should be comfortable with your tone and level of technical
language, so align it to your audience’s role and experience. Make
comparisons, allusions, and references that tell your audience “I get where
you’re coming from, and I’m meeting you there.”
You should keep your desired outcome – an element of intent – in mind Nussbaumer-Knaflic makes the following immediate refinements:
throughout the process. Is your intent simply to inform your audience about a
topic, or do you wish for them to take action? If so, what type of action? Do you The blue background represents a meaningless variation in color, so it is
need to highlight certain elements of a dataset not only because they are removed;
interesting, but also because they relate to an important proposal or initiative for
The sample size does not lead a reader to any interesting conclusions (i.e.,
which you want to gain support?
it is not part of her intent), so she moves it to a footnote;
Reviewing your data before you start may lead you to an insight to explore For a focused visualization, she applies a heatmap to the more easily-
through visualization. Amanda Cox, editor of “The Upshot” at The New York understood metric: average score.
19
After these refinements, the table appears as shown in Figure 10. schemes taken from an agency, state, or local university or sports team, can
communicate your desire to connect with them;
Figure 10: Intermediate Formatted Table, “visualizing opportunity” (Cole
Nussbaumer-Knaflic) Tone – Beyond avoiding technical jargon, your tone should be intentional.
If your audience is expecting something casual, formal language will fail to
resonate, and vice versa; and
References – With almost any data project for a local, regional, or State
agency, information should be compared locally unless the intent is to place
local data in a national or international context.
She then notes that her objective is to show opportunity: how much better could An example of how these concepts can be applied: Chris Hedden, Dan
we be doing in each category? So she revises her chart type to a stacked bar Krechmer, and Ron Basile of Cambridge Systematics produced a cartoon-based
with a transparent gap between reported and benchmark performance, yielding slide presentation (Figure 12) to inform Transportation Planners about connected
her final product as shown in Figure 11. and self-driving cars.
Figure 11: Final Formatted Table, “visualizing opportunity” (Cole Nussbaumer- Figure 12: “The Top Five Things Planners Need to Know About Self-Driving Vehicles”
https://www.camsys.com/insights/top-5-things-planners-need-know-about-self-driving-vehicles
Knaflic)
Through this process, Nussbaumer-Knaflic has clarified the context of her data,
focused the audience on the most important metric, and communicated
additional information about that metric (the opportunity for improvement) by
visualizing the data rather than stating it.
Audience
A beautiful and informative visualization does no good if it cannot its target
audience cannot understood it. An overly technical illustration will not effectively Despite the technical audience, they chose a casual approach to convey the
reach an audience of laypeople. A designer can positively impact audience inevitable ubiquity of the technology and the high-level approach of the slides,
response by playing to its known interests through: and to capture an audience that might avoid the topic because it was widely
perceived as too complex to address. The audience became open to taking in
Visual Cues – Section 3.4 will discuss the use of human-recognizable the technical details because they were presented in an accessible manner. The
objects. Beyond using familiar imagery, you may wish to tie the cues directly document achieved record views and inquiries, suggesting that it motivated
to your audience. Pictograms of local landmarks and icons as well as color people to delve into the topic further.
20
3.3 · Analysis
This notion of how close to the data you want to be and what is your question –
what is the story you want to tell? – seems to be really important.
Are you and your data telling the same story?
Your analytical and aesthetic decisions should reflect the nature of your dataset. Data Literacy
Explore how much data you have, how many ways it can vary, and your need to
illustrate uncertainty. Selecting a chart type or homing in on a “look” without To be data literate, you must understand what your data both can and cannot
considering the data may make your visualization difficult to comprehend. be made to communicate, and identify where relevant uncertainty can be shown
visually. A lack of absolute certainty is not an impediment to effective
Analysis is part of a feedback loop with Data Wrangling and Intent – If you realize visualization, and not all uncertainty is necessary to illustrate. Furthermore, data
that your data don’t tell the story you wanted, do you clean, manipulate, or add literacy can aid in the analysis-intent feedback loop – a logical problem often
data, or do you want to re-evaluate the argument you are making? Do your offers an opportunity to improve your message.
outliers signal error, or do they have meaning that you need to consider? Are
your data in general trustworthy: do you need to show uncertainty? Critiques of data literacy and appeals to critical thinking can be found in many
forms and from many commentators. In his Data Journalism Handbook, Nicolas
Kayser-Bril outlines some of the pitfalls of drawing unsupported conclusions:
Visualizing for an Audience of You
“When writing about an average, always think ‘an average of what?’ Is the
As with the other elements in the feedback loop, one way to make analysis easier reference population homogenous? Uneven distribution patterns explain why
is to visualize early and often. It will help you understand the data and, as a most people drive better than average, for instance. Many people have zero or
result, use it more appropriately. You are creating a visualization because it will just one accident over their lifetime. A few reckless drivers have a great many,
illuminate patterns and increase clarity for your audience – take advantage! pushing the average number of accidents way higher than most people
experience.”
In March, 2010 interview with acmqueue, Fernanda Viegas notes the importance
of identifying patterns through iterative visualizations: (http://datajournalismhandbook.org/1.0/en/understanding_data_0.html)
“[We] spent the whole summer trying to figure out a good way to visualize Applying this principle to a transportation context, it may be the case that the
[Wikipedia] editors, but we kept getting these not-very-useful results. At one point majority of intersections experience below average accident rates, or the majority
we tried just to get a sense of the shape of the data using bar charts, line graphs, of bridges have above-average maintenance records. When visualizing these
and stack graphs, but that wouldn’t tell us anything either. datasets, you should be prepared both to respond to an audience that points out
these “logical flaws” and to reflect them in your intent. Do you want to visualize
Eventually, we decided to try out a very weird technique, which was mapping the difference from the average, or can you reduce your sample set by focusing
streams of text to colors. This makes you lose a lot of information because text is only on the problem locations?
really rich and you can only use so many colors. All of a sudden we saw patterns.
Someone was going around all of Wikipedia correcting typos; another person “Articles about the benefits of drinking tea are commonplace… although the
was working on images; another was working on stub sorting… effects of tea are seriously studied by some, many pieces of research fail to take
into account lifestyle factors, such as diet, occupation, or sports. In most countries,
Looking back, we feel that the very first experiments we did with the data were on tea is a beverage for the health-conscious upper classes. If researchers don’t
too high of a level. They were abstracting too much away from the data and not control for lifestyle factors in tea studies, they tell us nothing more than ‘rich
giving you this sort of messiness that Wikipedia has, which is everybody’s there, people are healthier, and they drink more tea.’”
every day making minute changes… that add up to patterns.
21
Once again applying the principle to transportation, a map of mode choice Figure 14: 3D Worldwide Air Pollution Map, where Color Indicates Confidence (Kai
across a region may show lower-income areas commuting by transit more often Pothkow, Britta Weber, and Hans-Christian Hege, “Probabilistic Matching Cubes.”
than by single-occupancy vehicle, except in areas nearby to centers of service Computer Graphics Forum, 30(3):931-940, 2011.)
and manufacturing employment (which have shifts outside of transit operating
hours). It would be insufficient to simply draw conclusions about mode choice in
these neighborhoods without accounting for these demographic trends; adding
them presents the opportunity to provide your audience with useful insight and
illuminate new parts of your data.
Beyond simply showing the audience that the data do not present certain
conclusions, you also can develop and visualize scenarios based on varying
assumptions, as demonstrated by the Victoria Transportation Policy Institute in
Figure 13.
Figure 13: “Autonomous Vehicle Sales, Fleet and Travel Projections” (VTPI)
http://www.vtpi.org/avip.pdf
Figure 15: “Cat’s Eye” Approach to Visualizing Statistical Error (Geoff Cumming)
http://www.psychologicalscience.org/index.php/publications/observer/2014/m
arch-14/theres-life-beyond-05.html
Using a visualization strategy that clearly communicates that the data are
not meant to be exact (e.g., shapes instead of columns on a column chart);
Fading edges, increasing transparency, or in some other manner altering
the appearance of conventional data points (as shown in Figure 14); and
Including error bars (an alternative approach – the Cat’s Eye (Figure 15).
22
Using Visualization to Drive Analysis Virginia DOT (VDOT) provides another example in Figure 17. DOTs are
adopting dashboards to illustrate system performance, either in a static form (i.e.,
Beyond the need to perform analysis to drive your visualization, it is important to to report performance to the public) or in an interactive form (i.e., to allow
recognize your visualization’s potential for informing and facilitating analysis planners and budget-makers to project the consequences of their decisions).
done by others. Dashboards can greatly facilitate performance-based planning and budgeting,
a key mandate of recent federal legislation.
For example, the Delaware Valley Regional Planning Commission (DVRPC)
developed the Ridescore metric for bicycle accessibility at Philadelphia-area Figure 17: The VDOT Dashboard (Virginia DOT)
commuter rail stations. Not only does the metric combine many measures of http://dashboard.virginiadot.org/default.aspx
accessibility in to one easily-consumed number, it also allows for the data to be
presented in a single map. The screenshot in Figure 16 shows this map, which
leaves the immediate impression that bicycle accessibility improves the closer
one gets to the city center, as well as identifying outliers – suburban stations with
superior access for cyclists. The same interface displays the constituent scores
when a user clicks on a station.
Text intentionally left small to focus the reader on the overall image.
Text intentionally left small to focus the reader on the overall image.
23
Your medium has a profound impact on your design. Zooming and filtering of With zero dimensions, the visualization shows how many bridges there are.
data is impossible if the medium is static. If your visualization is intended for a This could be accomplished with a stylized number, with a collection of small
large-scale poster or presentation board, then you can either expand the bridge icons, or with a proportionally-sized box (in reference to some outside
dimensions of a single visualization or make a greater number of simpler charts. point of comparison);
One dimension could be location (e.g., a map of bridges), NBI condition
The form and dimensions of the page or screen can and should drive the
(e.g., a bar chart), type (e.g., a pie chart or treemap), and so forth;
arrangement and even the inclusion of information – if it is placed where the
audience will have to scroll down, flip a page, or turn around to see it, they may Two dimensions could any pair of the above. For instance, location and
not see it. If the visualization is to be delivered in a printed book, information on condition could be visualized at the same time using a choropleth, with
some pairs of consecutive pages (i.e., facing pages, which form spreads) is far regions colored by average condition; and
easier to consume at once than on other pairs, where the pages are on reverse
Three dimensions could add another variable. For instance, if time were
sides of the same sheet.
added to the above, the choropleth map could be animated to show
The possibility of publishing content in web-based documents opens new changes in average condition in each region over time.
opportunities for your audience to tour through information and for presenting Tamara Munzner and Torsten Möller discuss dimensions in the language of
interactive visualizations naturally in the course of a document. The Washington “marks and channels.” To them, a mark is a “basic graphical element or
Post produced a classic best practice for this approach in its 2014 feature geometric primitive” – a point, line, area, or volume. A channel is a means of
“Reimagining Union Station.” controlling appearance. Möller’s slide presentation on the topic lists position,
size, shape, orientation, and hue/saturation/lightness as channels.
24
25
Figure 18: Excerpt from the Florida Transportation Plan (Florida DOT)
http://floridatransportationplan.com/)
Visualization Strategy in the Real World
Brian Card and Mike Barry: “Visualizing MBTA Data”
The first 17 minutes of Barry and Card’s seminar at Simmons College are
discussed in Section 3.2. Moving on from data wrangling, they discussed their
strategy and process for visualizing the data.
26
Choosing the right tool depends on your strategy and your level of expertise. This
section describes many of the most useful visualization tools covering a range of
strategies and skill levels. We use our professional judgment to define the
ease-of-use of each of the tools.
Implementation: 31:00 – Barry and Card hosted their work at GitHub Pages
due to its simplicity, lack of cost, and unlimited traffic accommodation. They Esri’s ArcGIS is the gold standard in GIS software. It is a full-fledged professional
added a date and header, used AddThis to include sharing buttons (partially tool, but even novice users can create simple maps. Developers can create
to grant the site credibility for people stumbling across it). They implemented custom interactive web pages and apps using ArcGIS servers, APIs, and software
Google Analytics to track unique visits and visitors. Finally, they added tags developer kits (SDKs).
to tell social media networks how to render an image, description, and title
when the page is shared. Platforms: Windows (desktop and server) | Online via web | API for
developing apps and web pages.
Cost (as of April 30, 2016): Desktop – $1,500 and up | Online - $2,500
for five users and up | Server - $5,000 and up for perpetual license | $100
for personal use | Discounts for non-government organizations, non-profits,
and schools.
27
Support: Esri provides online documentation and self-service and paid Cost (as of April 30, 2016): Free (Power BI desktop and service) | $150 and
support | Esri Developers Network | Esri-related conferences and user up (Office) or $70 per year and up (Office 365 – cloud) | $300 and up
groups | Extensive community of users | Books | Commercial support. (Visio) or $13 per user per month (Visio for Office 365).
Publishing online: Via Esri cloud (requires service credits) or your own ArcGIS Support: Microsoft provides online documentation and tutorials | Active user
server. community.
Publishing Online: Power BI can publish to the Power BI service.
Support: Online community | Online documentation and tutorials | Books Cost (as of April 30, 2016): Part of Creative Cloud, starting at $9.99 per
| Commercial support. month for a single application.
Publishing online: QGIS Server and Web Client | Export to Leaflet or other Support: Adobe provides online documentation and tutorials | Active user
servers. community.
Publishing Online: Not available.
General Tools General visualization tools allow you to upload data from a variety of sources
The multi-purpose office tools allow users to build many of the most basic data (e.g., Microsoft Excel, comma delimited, R). Once the data is in place, the
visualizations and, with practice, they can make elegant visualizations. application can illustrate it in dozens of ways with limited customization. Finished
visualizations can be exported for use in reports and presentations. Some tools
facilitate hosting for interactive projects.
Platforms: Windows (desktop and cloud) | Mac OS X | Windows (Power BI). Platforms: Windows (desktop and server) | Mac OS X |Online via web.
28
Cost (as of April 30, 2016): $999 (personal desktop) | $1,999 (professional For Developers
desktop) | $10,000+ (server) | $500 per user per year (online) | Free
(Tableau Public) | Discounts for non-profits and educational use. Custom, interactive visualizations like those seen in The New York Times
generally are developed in JavaScript (an internet browser coding language). To
Support: Tableau provides online documentation and self-service support as build visualizations using these libraries, you will need software programming
well as paid support | Tableau-related conferences and user groups | skills and comfort with web publishing.
Extensive community of users | Examples readily available (visualizations on
Tableau Public can be downloaded and reverse-engineered).
Publishing Online: Tableau Public | Tableau Online or Server | Hosted
visualizations can be embedded in other web pages. Data Driven Documents, or D3.js, is an open-source JavaScript library that
provides powerful visualization components. If you have strong
web-development skills, you can find an example visualization that fits your
strategy, copy the code, and build your own.
Qlik is a general-purpose visualization environment with powerful and easy-to-
use tools for creating interactive data visualizations. With a paid version or cloud Platforms: JavaScript | Runs in all recent web browsers.
hosting, you can embed visualizations or share them on the web. Qlik provides
Cost (as of April 30, 2016): Free, open source.
an API that enables you to mashup and extend visualizations in sophisticated
Web applications. Support: D3.js provides online documentation and lots of examples | Active
user community | Vast gallery of examples, many with source code shown.
Platforms: Windows (desktop) | Online via Web | API for developing apps
and web pages. Publishing Online: JavaScript scripts in a webpage, any web server.
Cost (as of April 30, 2016): Desktop - free for personal or internal business
use | $20 per user per month for Qlik Sense Cloud | $1,500 per token (one
user or ten logins per month) | QlikView Enterprise (server) priced on hybrid
server and client access model. You can add charts and graphs to Google Sheets, and you can access those
Support: Qlik provides online forums, consulting, training, and conferences same visualizations and data through various APIs. Google Maps is accessible
| Active user community. via API, enabling various map-based visualizations. Fusion Tables is an
application to gather, explore, and share data tables. It helps you find public
Publishing Online: Qlik Sense Cloud (share with up to five others, 250 MB data, visualize it, and host it online.
free).
Platforms: JavaScript | Runs in all recent web browsers.
Cost (as of April 30, 2016): Free, under terms of Google APIs Terms of
Service (https://developers.google.com/terms/).
Support: Google provides online documentation and forums | Active user
community.
Publishing online: JavaScript scripts in a webpage, any web server.
29
R is a power tool for data wrangling and statistical computing that also creates
data visualizations. It is like a software development environment – the basic
package includes a command-line editor and interpreter. RStudio provides a
graphical development environment but still requires you to write scripts.
Several graphics packages make creating plots and charts fairly easy, and Shiny
(also from RStudio) produces interactive web pages.
30
3.6 · Putting It All Together Data Wrangling – We held a workshop to explain the types of data we
needed and what we planned to do with it. We collected written documents
One practitioner’s example (e.g., Citizen’s Guide to the Transportation System and annual reports for
the Turnpike and DOT) and spreadsheets (e.g., a comprehensive budget
Members of our team worked with the New Hampshire DOT to develop a Sankey book) describing cash flows.
Diagram for the department’s Transportation Asset Management Plan.
We had a sense of who the audience would be and the story we wanted to
The chart shows the flow of funds from revenue sources on the left – through tell, so we refined the data so it had common revenue, program, and
funds and programs in the center – to uses on the right, all proportionally-sized expenditure categories and names. This took some effort.
and colored by revenue source.
Intent and Audience – The audience for this chart includes the public, FHWA,
Figure 21 shows the chart and the bullets to the right walk through how we internal staff, and legislature. The intent was to explain to this audience how
considered the elements of this Guide to produce it. money is spent on different asset management programs, by asset (i.e., how
much did you spend on maintenance and how did you pay for it?).
Figure 20: New Hampshire Funding Flows – Typical Year We wanted to highlight connections among revenue, programs, and
(New Hampshire DOT, 2015) investment categories. As we sketched with stacked bar charts, we could see
how revenue tied to programs but not how it related to expenditures. We
needed something that had more connections.
Analysis – The Sankey requires that every flow balances. The DOT does not
manage their income and investments like this, so we needed to make some
assumptions to tie them together. We went back and modified the data,
creating a hypothetical fiscal year that explicitly ties the flows together
through the whole process. We checked with the fiscal folks to make sure
that these assumptions were appropriate.
Choosing a strategy - The Sankey Diagram was effective at communicating
our intent to our audience. We wanted to make clear how the revenue
sources flowed through the diagram, so we kept them in the same color
scheme (e.g., all toll revenues are in blue). We added text throughout to
help the reader understand the chart. We also experimented with the
organization of the flows to ensure readability.
Tools and implementation - We used Excel to wrangle the data. We
generated the diagram using SankeyMatic, a free online tool built in
JavaScript, but easy to learn for those without coding experience. The final
graphic was built in Adobe Illustrator by tracing a screenshot of the raw
diagram; this allowed us much more control over the look and feel of the
chart.
31
32
Color/Hue: The twelve purest and brightest colors, including the three wide spectrum; it can be helpful to identify a few people in your workplace with
primary colors (red, blue, yellow), three secondary colors (violet, green, color blindness who can help to test your materials. Also consider how easily
orange), and six tertiary colors (blue-violet, red-violet, yellow-green, blue- distinguishable the various tints, shades, and tones are from one another within
green, yellow-orange, red-orange); the same hue family, as strong visuals use color to clearly demarcate separate
pieces.
Tint: The lightened version of any color, also known as pastel, created by
adding white. Tints can range from slightly lighter than the hue to almost There are many resources on color theory that define good color combinations
white; to choose for your scheme. If you aren’t sure where to get started, see Pantone
Shade: The darkened version of any color, created by adding black. Shades for excellent advice:
can range from barely darker than the hue to almost black; and
https://www.pantone.com/pages/MYP_myPantone/mypantone.aspx
Tone: The grayed version of any color, created by adding both white and
black. Tones typically are considered more appealing color combinations When choosing a color scheme to represent continuous data (e.g., in a heat
than simple tints or shades. map), it is best to avoid using the colors of the rainbow:
There is no “greater than” and “less than” order to colors the way there is
Palette with light to dark.
When creating visualizations, you need to determine a color palette that can be It is difficult to spot differences: Human eyes are not good at detecting color
used consistently throughout the design. The palette can be monochromatic differences. This makes it difficult to spot differences among dimensions.
(using only one color), black and white, full color, or neutral. Color palettes for
visualizations typically comprise a primary palette and secondary palette. The Contrast
primary palette includes the colors that are used most frequently, while the
secondary palette provides additional complementary colors that can be used as When choosing colors for text and background, contrast is key. If you plan to
needed throughout the design. The secondary palette colors often are bright, as stray from black text on a white background, you should consider the
they are intended to be accents. transparency/opacity of your text. Transparency indicates how easy it is to see
through the color; opacity indicates how difficult it is to see through the color.
Google has created a helpful online video demonstrating that within the selected
Light text on a dark background typically requires a higher level of opacity than
primary palette alone, there are various options for regard tints and shades of
dark text on a light background. Brightness is another factor to consider.
the hue, before incorporating secondary palette accents:
Our eyes find it easiest to read text that is different in terms of color and in
https://design.google.com/videos/palette-perfect/
brightness from the selected background. Choosing contrasting colors, such as
colors on the opposite side of the color wheel, helps to ensure legibility. For
Scheme example, dark violet text does not work well against a blue background, but it
reads well against yellow (particularly light yellow)
Color schemes are informed by color palette. Depending on the selected color
palette, the color scheme will include the tints, shades, and tones of the primary There are online color contrast checkers that can help you verify whether you
colors and the accent colors used in all designs. When selecting a color scheme, have chosen colors with ample contrast ratios, e.g., WebAIM:
it is important to consider color insensitivity and color blindness. When using reds
and greens together, choose highly saturated, darker shades rather that light http://webaim.org/resources/contrastchecker/
tints, and use thicker lines. Color blindness is fairly common and falls across a
33
4.4 · Federal Requirements for Style When a webpage requires an applet, plug-in, or other application to
interpret page content, include a link to a plug-in or applet that complies
Under Section 508, the federal government outlines a number of standards to with §1194.21 (a) through (l);
guarantee equal access to information conveyed electronically for those with or Create electronic forms completed online to allow people using assistive
without disabilities. All your visualizations developed, procured or maintained by technology to access the information, field elements, and functionality
federal departments and agencies must comply with the standards. To familiarize required for completion and submission of the form, including all directions
yourself with these standards, use these resources: and cues; and
http://www.fhwa.dot.gov/publications/research/general/03074/index.cfm
Formatting Documents
Technical Standards Figure captions must describe the chart in the caption title, as in Figure 22.
Figure 21: Labor Division by Income Level (Cambridge Systematics)
Provide a text equivalent for every non-text element (e.g., via “alt”,
“longdesc,” or element content);
Synchronize equivalent alternatives for any multimedia presentation; Labor Division by Income
Design webpages so all information conveyed in color also is available 6
without color, as in context or markup;
Organize all documents to be readable without associated style sheets; 5
Provide redundant links for each active region of a server-side image map;
Provide client-side image maps rather than server-side image maps, except 4
where regions cannot be defined with an available geometric shape;
Identify both row and column headers in data tables; 3
Use markup to associate data cells and header cells for data tables that have
two or more logical levels of row or column headers; 2
Title frames with text that facilitates frame identification and navigation;
1
In the instance that compliance cannot be accomplished in another way,
provide a text-only page, with equivalent information/functionality, to ensure
your website complies with stated requirements. Update the text-only page 0
each time the primary page changes; Clerk Designer Artist CEO
For pages using scripting languages to display content or create interface Low Income Medium Income High Income
elements, identify all information provided by the script with functional text
that assistive technology can read;
34
Alt text and table summaries must be clean. Clean alt text/summaries for
Figures/Tables/Equations. Where needed, break into long descriptions (the
HTML Validato [https://validator.w3.org/] suggests a maximum of 75
characters; others suggest 100). When cleaning, make no references to
color, remove special characters if they’re not necessary (though they are
allowed), and spell out acronyms in summaries and alt text.
Image Requirements
The Section 508 requirements specifically for images include:
Image weights are less than 30 K (when possible without making illegible);
Image widths are less than 420 pixels (when possible without making
illegible);
Save all files as fig[section#][figure# in section] (i.e. “fig41.jpg”), using all
lowercase letters; and
File names should never exceed 20 characters or contain dashes, special
characters, or spaces (only underscores).
35
To envision information … is to work at the intersection of image, Visualizations need to succeed in two areas: be engaging, and be
word, number, art. easily understandable.
Edward Tufte Jean-Daniel Fekete
Transportation planning is a field and an industry built for visualization. Because your visualization is designed around your audience, you should use
Information of relevance to planners can be readily illustrated, be it the design imagery that speaks to them. Use colors that correspond to a client’s logo, or to
alternatives for a project, traffic flow in the peak hour, bicycle mode share, or a local sports team. Use human-recognizable objects to create pictograms in
color-of-money. Transportation professionals must also frequently communicate place of bar or bubble charts. Use logos in place of text to take advantage of
plans, objectives, and justifications to lay stakeholders and a public in which their brand equity and immediate recognition. Above all, provide information
“everyone who drives thinks they’re a traffic engineer.” clearly to send the message that you are both a reputable and innovative source
for that information.
Visualization is… taking advantage of the fact that we are so The first sign that a visualization is good is that it shows you a
programmed to understand the world around us in terms of what problem in your data.
we see. Martin Wattenberg
Fernanda Viegas
When visualizing information, you should expect that many in your audience will For even simple datasets, visualizing can provide insight that leads to better data
likely “just look at the pictures.” Not only should a visualization tell a story, but it and, in turn to better visualizations. This positive feedback loop is at the core of
should tell a complete story, with a subject, a function, and a desired outcome. complex and interactive data visualization, and the refinement of both end
products increases with each iteration.
Text intentionally left small to focus the reader on the overall image.
36
This appendix lists examples of best practice visualization, selected by the project team. They are also provided on
the “Examples” tab of the Vizguide website (vizguide.camsys.com).
Opening Notes
Best practice examples of transportation visualizations were collected from several sources, including:
For presentation both in this appendix and on the website, the visualizations were grouped into 10 subjects:
37
Asset Management
NHDOT Funding Flows – Typical Year
New Hampshire Department of Transportation, 2015
38
39
40
Ridescore
Delaware Valley Regional Planning Council, 2015
41
Environmental
Massachusetts Clean Energy and Climate Plan
Massachusetts Executive Office of Energy and Environmental Affairs, 2015
42
43
44
Transit
Transit and Density
Alain Bertaud and Harry W. Richardson, 2004
45
46
Remix
Remix is a Transit Planning start-up with
a visually compelling browser-based
visualization and analysis tool. It allows
the user to draw transit service on a
map, place stations, and forecast
operating costs, performance, and
economic impact based on many
integrated datasets.
47
Highway Mobility
Transportation Outlook
Mid-America Regional Council, 2014
48
Top US Interstates
Federal Highway Administration, 2015
49
Timeline
CATT Lab, University of Maryland, 2012
50
51
52
53
Freight
Incentives for Truck Use of SH-130
Texas Transportation Institute, Texas A&M University, 2015
54
55
GEDVIZ
Bertelsmann-Stiftung, 2016
56
Safety
Five Years of Traffic Fatalities
John Nelson and IDV Solutions, 2010
57
58
http://www.boostlabs.com/portfolio-
item/national-highway-traffic-safety-
administration-fatality-analysis-
reporting-system-fars/
59
Socioeconomic
Mesa County Employment
Mesa County, Colorado, 2014
60
61
People Movin’
Carl Zapponi
62
63
64
65
Key
Y = Fully functional E = Effort required
P = Partially-functional X = Plugin required
PowerPoint
Photoshop
Charts API
Maps API
Illustrator
Qlikview
Power BI
Tableau
ArcGIS
Leaflet
Fusion
Tables
Sheets
QGIS
Excel
d3.js
Visio
R
Map
Choropleth Y Y Y Y Y Y Y Y Y
Bubble Y Y Y Y Y Y Y Y Y
Route Y Y Y
Pie Map Y Y Y
Dot Density Y Y Y
Flow Y Y Y
Area Cartogram Y Y Y
66
PowerPoint
Photoshop
Charts API
Maps API
Illustrator
Qlikview
Power BI
Tableau
ArcGIS
Leaflet
Fusion
Tables
Sheets
QGIS
Excel
d3.js
Visio
R
Bar
Horizontal/
Y Y Y Y Y Y Y Y Y Y
Vertical
Clustered Y Y Y Y Y Y Y Y Y Y
Stacked Y Y Y Y Y Y Y Y Y Y
Diverging Y Y Y Y Y Y Y Y Y Y
Bullet Y Y Y Y Y
Histogram Y P P Y Y
Pyramid Y Y Y Y Y
Radial Y Y Y Y Y
Line/Area
Segmented Y Y Y Y Y Y Y Y Y
Smoothed Y Y Y Y Y Y Y Y Y
Regression P P Y P P Y
Area Y Y Y Y Y Y Y Y Y
Stacked Area Y Y Y Y Y Y Y Y Y
Streamgraph X Y Y Y
67
PowerPoint
Photoshop
Charts API
Maps API
Illustrator
Qlikview
Power BI
Tableau
ArcGIS
Leaflet
Fusion
Tables
Sheets
QGIS
Excel
d3.js
Visio
R
Pie
Pie Y Y Y Y Y Y Y Y
Donut Y Y Y Y Y Y Y Y
Sunburst Y Y Y Y Y
Flow
Flowchart E Y X Y Y Y Y
Sankey E Y X Y Y Y Y
Heat
Matrix Y Y Y Y X Y Y Y Y
Calendar Y Y Y Y X Y Y Y Y
Smoothed Area Y Y Y Y Y Y
Scatterplot
Scatterplot Y Y Y Y Y Y Y Y
Bubble Y Y Y Y Y Y Y Y
Motion Y Y
68
PowerPoint
Photoshop
Charts API
Maps API
Illustrator
Qlikview
Power BI
Tableau
ArcGIS
Leaflet
Fusion
Tables
Sheets
QGIS
Excel
d3.js
Visio
R
Pictogram
Dot Matrix E E E Y E Y Y Y Y Y Y
Symbol Bar E E E Y E Y Y Y Y Y Y
Treemap
Treemap Y Y Y Y Y Y Y Y
Circle Packing Y Y
Node-Link
Node-Link E X X X Y Y Y
Arc X X X Y Y Y
Tree E X X Y Y Y
Chord E X X Y Y Y
Force-Directed E X X Y Y Y
Other
Table Y Y Y Y
Parallel Coordinates Y Y
Spider Y Y Y
Word Cloud Y Y Y
Gauge E Y Y Y Y
69
70