You are on page 1of 34

Data Visualization

By: Taggert J. Brooks

Representing Data Graphically


Data visualization, sometimes called information visualization - or infovis1 for short
comes from the convergence of computer science, statistics and design. It is a marriage
between science and art, between the left and right halves of the brain. The goal is to
make data presentation interesting, aesthetically pleasing and hopefully informative.
Good data visualization goes further by revealing relationships in the data that might
otherwise have gone unnoticed. With the absence of hypothesis tests it is easy to
discount visualization as unscientific, but that would be a mistake. There are many uses
of data visualization, and the reality is hypothesis testing can bore the audience, if not
completely surpass their level of understanding. Data visualization then is a means to an
end for statisticians who want to be better communicators. And its a pathway to a
better understanding of the data for the designers amongst us.
"In our excitement to produce what we could only make before with great effort, many
of us have lost sight of the real purpose of quantitative displays to provide the
reader with important, meaningful, and useful insight."
Stephen Few
I would add that good visualization techniques will not only help the reader, but also
help the producer of the visualization to discover meaningful insights
This document is meant to be an introduction to different visualization techniques, and
though I provide some practical how to, I do not provide everything. Where I fail,
Google and the internet can fill in the gaps.
Too Much Data
The internet has led to an explosion in the amount of data we have collected, stored
and easily accessible. It has done this through dramatically lowering the costs of those
activities. The problem we now face is filtering the valuable data from the invaluable data
and determining how we use it to inform business decisions or research. A recent
example of the ubiquity of new data can be taken from the presidential election. We
have data on the frequency of word searches in Google by each minute of the Vice
Presidential debate between Senator Joe Biden and Governor Sarah Palin.2 Apparently
people were trying to figure out exactly what a Maverick actually is.
What type of media will you use to make your presentation? How long does your
audience have to take in the data? The longer the audience has the more data dense the
visualization can and should be. The less time and autonomy your audience has to
peruse the data the more simplified the visualization should be.

A wiki dedicated to Infovis: http://www.infovis-wiki.net/index.php?title=Main_Page


A graph of the searches can be found here
http://www.readwriteweb.com/archives/google_has_changed_political_d.php

Data Visualization

By: Taggert J. Brooks

Will it be a written report, a power point presentation, or is the data going to be


rendered on the web? In other words will the visualization be static or dynamic? These
questions are some of the first you should answer when selecting a visualization
method.
Visualization is about Discovery, Discerning Patterns, and Disseminating
Information.
Below we have a nice info graphic describing the data collection to data use
continuum.

Here is a good example of the effectiveness of visualization for identifying outliers, or


data errors can be found below. This is derived from 3

http://www.visualizingeconomics.com/2009/07/12/data-scienist-data-geek-designer/

Data Visualization

By: Taggert J. Brooks

The picture above is a great way of using visualization to identify errant data. The
underlying data in this case must be no more than 100%, yet we can see one mistaken
observation.4
Selecting the Right Graph
Design is choice. The theory of the visual display of quantitative information consists of
principles that generate design options and that guide choices among options. The
principles should not be applied rigidly or in a peevish spirit; they are not logically or
mathematically certain; and it is better to violate any principle than to place graceless
or inelegant marks on paper.
Edward Tufte, The Visual Display of Quantitative Information
Selecting the appropriate display can be difficult because it involves a good
understanding of the nature of your data, statistics, as well as a good understanding of
design principles. There are many possibilities for a given variable or dataset, but you
need a place to start. There are a few web pages, which try to help, but none satisfy
both the issues of statistics and design simultaneously.5 As the quote by Tufte suggests,
the choice of design does not easily fit into a simple algorithm.

This is from the higher ed weblog http://blog.une.edu.au/robbi/2009/08/06/data-testing-usingvisualisation/


5

This webpage http://interface.fh-potsdam.de/infodesignpatterns/news.php is closer to the visual end


while this webpage http://www.ncsu.edu/labwrite/res/gh/gh-graphtype.html does a better job of helping
select the appropriate graph from a statistics perceptive and this one helps choose the right statistical test
http://www.ats.ucla.edu/stat/stata/whatstat/default.htm,
.

Data Visualization

By: Taggert J. Brooks

Some other examples of websites which try to provide guidance in the choice of
appropriate representations can be found in the blog entry titled Things should be made
as simple as possible, but not any simpler 6, which is a famous Einstein quote.
1. Determine the relationship you want to display
2. Determine if you want to emphasize individual values or the overall pattern
3. Determine the chart type
Bad charts
Before we begin discussing some of the common, and not so common visualizations it
might be better to provide some links to bad charts, and improvements. Stephen Few
provides some excellent examples of bad charts and then provides recommendations
for fixing the problems.7 Another set of examples is provided here.8
Many of these criticisms and corrections are based upon the rules and suggestions from
the work of Edward Tufte. His rules can be found at his website.9

http://blog.xlcubed.com/chart-rules-as-simple-as-possible-but-not-any-simpler/ A follow up can be found


here as well. http://blog.xlcubed.com/household-income-distribution-1967-2005-as-small-multiples-chart/.
Still another example of a chart chooser can be found here: http://chartchooser.juiceanalytics.com/, which
also produces Excel templates from your choices
7
http://www.perceptualedge.com/examples.php
8
http://lilt.ilstu.edu/jpda/charts/bad_charts1.htm
9
http://www.washington.edu/computing/training/560/zz-tufte.html

Data Visualization

By: Taggert J. Brooks

Seth Godin, the famed marketer has rules for making good graphs10.
Graph Types
Microsoft Excel is a common tool for creating graphic representations, but sadly their
default choices are often not good design choices. And many of the default graphs they
provide should never be used. While Excel 2007 is much better than the horrible
defaults in Excel 2003, they both can benefit from some alterations. For some details on
altering the charts after excel has created one using the default templates see the link
below.1112
Some traditional graphical means of data representation, which can be found under the
INSERT ribbon in Excel 2007:
Pie chart
The pie chart is useful for representing the relative proportions of a few categories. The
more categories, the greater the number of slices, the more difficult the chart is to
read.

The field of info visualization is rather new, and like any new field there are often very
impassioned people in the field with starkly different opinions. For some their beliefs are
almost religious, and the rules they profess delivered with the same vigor as a Baptist
Minister delivering a sermon from the pulpit. An example of this occurred in the
blogophere when marketing guru Seth Godin suggested there should be no more bar
charts, only pie charts. This led to a swift reply from the community of InfoVis folks,
many of who countered with the exact opposite advice. Remember the quote from
Tufte above, the reality is always somewhere in between, born of the exercise of good
judgment.
The problem with pie charts as infovis people will tell you - is that consumers of
visualizations have a hard time estimating angles. In fact, they get them wrong, thus
drawing the wrong inference from the slices of a pie chart. People are better at visually
judging height, which is why many infovis people prefer the column chart.13 The visual
hierarchy of Cleveland is provided at this website.14

10

http://sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html
How to alter the defaults in Excel: http://blog.xlcubed.com/defaults-in-excel-charting/
12
http://www.juiceanalytics.com/writing/fixing-excel-charts/
13
http://seedmagazine.com/content/article/getting_past_the_pie_chart/
14
http://www.processtrends.com/TOC_data_visualization.htm
11

Data Visualization

By: Taggert J. Brooks

15

Bar and Column Charts


Bar charts are often good for representing categorical data. You can present the
frequency of responses in each category, or the relative frequency.16 You can also
present the frequency or relative frequency of one variable, over the groups or
categories of another variable. Making it an excellent choice when you have two
categorical variables.
100

50

Column chart

50

Bar Chart

100

100

200

Stacked Bar Chart

Here is a recent bar chart I used to highlight US Debt to GDP ratio. Notice the use of
the single red bar to draw attention to the US relative to the rest of the OECD. Imagine
how ugly this would look, and how confusing if I used a different color for every
country? How would this look if I used the same color for every country? Obviously this
works in color, would it work in grayscale?

http://peltiertech.com/WordPress/pie-chart-for-pi-day/
Most of the charts in this article were produced in Microsoft Excel 2007, unless otherwise noted. They
were copied into Word 2007 using the pastepaste specialMicrosoft Excel object function.

15
16

Data Visualization
0

20

By: Taggert J. Brooks


40

60

80

100

120

140

160

180

Japan
Greece
Italy
Belgium
Portugal
Hungary
United Kingdom
Austria
France
Netherlands
Poland
Iceland
United States
Turkey
Germany
Sweden
Spain
Denmark
Finland
Korea
Canada
Ireland
Czech Republic
Slovak Republic
Mexico
Switzerland
New Zealand
Norway
Luxembourg
Australia
2008 Debt to GDP Ratio for OECD

Line Graph
The traditional line graph is generally used to measure a single variable (usually
continuous) over time, with time being represented on the horizontal axis. Though it
could be used to measure the relative frequency of a single response category over time
as well.
100
50
0
1

10

Data Visualization

By: Taggert J. Brooks

U.S. Payroll Employment: Total Nonagricultural: SA, Thousands of


Persons
142.0
137.0
132.0
127.0
122.0
117.0
112.0
107.0
Jan-90 Jan-92 Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08

A few quick notes about the above graph. Ive removed the horizontal gridlines as they
were an example of ink with no purpose. The background fill of the chart area has been
changed to white. I added shaded bars to denote recessions. If I were to improve this
further, I would probably reduce the number of labels on the horizontal axis, say maybe
every 36 months, rather than 24. Id also probably also reduce the number of labels on
the vertical axis as it currently feels a bit cluttered. Finally I might eliminate the title
altogether and make a very small footnote that contained the same information. Or
maybe just title the chart Employment and relegate the details to the footnote.
Area Chart
An area chart is a line chart with the area below the line shaded. This can be useful
when you have two lines over time and one line represents a subset of the first. For
example, you could have retail sales over time broken into two categories, durable and
non-durable goods.
200

100

180
160
140

50

120
100
80

60
40

1 2 3 4 5 6 7 8 9 10

20
0
1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

Scatter Plot
Scatter plots are useful when you have two continuous variables with one represented
by the X axis and the other on the Y axis. A third variable can be used to measure
another attribute of the points, yielding a bubble chart, which will be discussed later.

Data Visualization

By: Taggert J. Brooks

100
50
0
0

50

100

Tables
We should not always rush to make a chart, sometimes just presenting the numbers in
tabular form is sufficient to get your point across, or maybe you blend both? Below are
two examples using the conditional formatting in Excel 2007, which blends the graphic
design of a chart with the data in tabular form.17
LeisureTimeSpent
biking
125
hiking
40
reading
30
singing
25
dancing
10
cleaning
5

LeisureTimeSpent
biking
125
hiking
40
reading
30
singing
25
dancing
10
cleaning
5

Whenever presenting data like this it is useful to rank order the data from largest to
smallest. Failure to do so makes it a bit harder for the reader to sift through the data as
you can see from the example below.
LeisureTimeSpent
biking
125
hiking
5
reading
50
singing
75
dancing
10
cleaning
80

LeisureTimeSpent
biking
125
hiking
5
reading
50
singing
75
dancing
10
cleaning
80

A simple way to quickly deemphasize the numbers is to change the font of the numbers
to white.
LeisureTimeSpent
biking
hiking
reading
singing
dancing
cleaning
17

125
40
30
25
10
5

In the Home Ribbon select conditional formatting data bars

Data Visualization

By: Taggert J. Brooks

The one very unfortunate issue with this technique is that Microsoft Excel violates an
important statistical and visualization principle with their bars. Zero values should be
represented by the absence of any color, bar or indicator. Yet, no matter how small the
lowest quantity in the range of cells the bar appears to be about 5%, even if the value is
zero, as can be seen in the example below.18
LeisureTimeSpent
biking
125
hiking
40
reading
30
singing
25
dancing
10
cleaning
0

Spark Lines
Sparklines are small inline line graphs developed by Edward Tufte19.
GDP [5.8%]20
GDP [5.8%]
Notice how simple the sparkline is. We have removed the clutter of the Y and X axis
labels. Yet the important information is still there, you see the relative values, clearly it
is not currently at its highest value yet is higher than previous. Compare that to the
more traditional graph below:

GDP
10
8
6
4
2
0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

18

Thanks to the excellent juice analytics for making this point.


http://www.juiceanalytics.com/writing/excel-2007-and-lie-factor/
19
Edward Tuftes explanation of the theory and practice of sparklines
http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1
20
The sparkline was created with the free open source add in for Microsoft Excel, called TinyGraphs. It
can be found here: http://www.spreadsheetml.com/products.html.

10

Data Visualization

By: Taggert J. Brooks

This representation clearly consumes more space, and invites the reader to linger on
the chart, rather than the point you are trying to make about the data. However, this
type of chart has its place. For example it might be a better representation if it is
important for the reader to see that the highest value occurred in 1996, or that the
lowest value was in 1995, or if you want them to easily see that GDP fluctuates between
2% and 6%.
It is important to note that sparklines can be more than just line charts. They can be bar
charts, pie charts, etc. Sparklines merely refers to what Edward Tufte calls Intense,
Simple, Word-Sized Graphics. Sparklines are obviously not well suited for power point
type presentation graphics, but are well suited for written reports, or the currently in
vogue data dense business intelligence reports referred to as Dashboards.21
Bullet Graph
The Bullet graph, due to Stephen Few, is another piece of dashboard graph.
22

There is also a google gadget api for use in google docs that will produce this23.
Spine Plots / Mosaic Plots / Matrix Charts
These are best used for categorical data. Notice that we have added another dimension
to the data by making the width of the bar proportional to the fraction of cars in that
category (domestic versus foreign). Thus taking the traditional bar chart and adding
another level of data.

Made with Statas ado file spineplot-. Jon Peltier has a solution for Excel which he calls
a Matrix Chart 24. It is available in statistical language R as well.25
21

For some examples see http://www.ozgrid.com/excel-add-ins/spark-maker-explained.htm


The picture come from Stephen Fews Perceptual Edge here
http://www.perceptualedge.com/blog/?p=375
23
http://dealerdiagnostics.com/blog/2008/09/the-ddr-bullet-graph-gadget/
24
http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=508
25
http://ideas.repec.org/a/tsj/stataj/v8y2008i1p105-121.html
22

11

Data Visualization

By: Taggert J. Brooks

Heat maps
Heatmaps are 2 dimensional maps where the color intensity represents the underlying
data. The above table on the right can be thought of as a heatmap. The darker orange
colors represent larger values. When choosing the different colors to use, designers rely
on color theory. Colorbrewer is a useful website to make sure that viewers can clearly
distinguish differences in your data.26
Choropleth Maps (Color Maps)
Choropleth maps are a specific type of heat map where the two dimensional object is a
geographical map. The map is then painted with color based upon the intensity of the
underlying variable. Often darker colors represent larger values of the underlying
variable. This is a great way to visually represent data that varies geographically. The
example below was produced with Stata and comes from some foreclosure data I have
by county. The data represents the number of foreclosure filings as a percentage of
housing units in each county for 2007 and the darker the shading of the county the
higher the rate of foreclosures filings in that county. Juneau County sticks out as the
obvious county with the highest rate of foreclosures filings.

A similar graph for the state of Wisconsin is below. Note that the shading has changed
relative to the previous graph and is now based upon different intervals.

26

The website can be found here:


http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html

12

Data Visualization

By: Taggert J. Brooks

While I used a statistics program (Stata) to generate these graphics, there are many
opportunities for producing your own choropleths on the web. Google Documents has
added their own visualization tools, which include the ability to create choropleths for
different countries.27 These maps and the presentation of this data geographically
intersect with a rapidly growing field and use of Geographic Information Systems (GIS)
in economic geography. Can you imagine the marketing uses for this type of
information?
There are of course problems with these types of maps as well. They can mislead a
viewer. The geographic area may be completely unrelated to the area at risk. For
example, if the map represents foreclosure rates as these do you might think Juneau
County represents a large economic problem for the region. However, the reality is
that the population of Juneau is quite small relative to La Crosse, and while the
foreclosure rate might be high, the total number of foreclosures is still quite small,
because there are fewer houses in that county relative to some of the other counties.
The fundamental problem is that the graphic invites you to infer economic importance
in proportion to geographic size, when this is not true. One solution is to distort the
geographic area based instead on the metric of interest.
Cartograms (Distorted Maps)
Another example of using colors and maps comes from the following distorted maps,
where the distortion is based upon some underlying variable, in this case alcohol
consumption. Here the color only serves to demarcate the different countries. Rather
than color intensity conveying the values of the underlying variable we the creators have

27

Details on producing these maps can be found here


http://documents.google.com/support/spreadsheets/bin/answer.py?answer=91599
And here http://googlesystem.blogspot.com/2008/02/data-visualization-google-gadgets.html

13

Data Visualization

By: Taggert J. Brooks

distorted the size of the country proportionally to their alcohol consumption. There are
some people who feel cartograms hide more than they reveal.28
Alcohol Consumption (2001)29

Another example of a cartogram comes from the recent election.30Below is a


reinterpretation of the simplistic red/blue map you might have seen on TV or in the
newspaper. Now the colors are shaded based upon the vote, rather than simply one
color for each party based upon the majority vote in that state. The states are also
distorted by the number of votes cast in that state.

Compare that to the traditional depiction:


28

http://flowingdata.com/2008/11/13/alternative-to-cartograms-using-transparency/
The distorted maps presented here come from the following article
http://www.dailymail.co.uk/news/article-439315/How-world-really-shapes-up.html. Producing the distorted
cartograms involves a substantial knowledge of programming, graph theory.
30
http://www-personal.umich.edu/~mejn/election/2008/
29

14

Data Visualization

By: Taggert J. Brooks

Treemaps
Tree Maps are another type of heat map, well suited for hierarchical data. The classic
example on the internet is the smartmoney.com map of the market31. Here the
hierarchy from bottom up is as follows: start with individual stocks, they are group by
company, which is represented by market capitalization (outstanding shares of that
company times share price). Higher market capitalization for the firm, means a larger
area for their box. This would be the initial box. Then companies are further grouped
together into a larger box by industry. The small boxes are then colored based upon the
percentage gain or lost on the day, with green representing gains and red representing
losses. Visually it is very important to distinguish gains from losses by different colors.
That was the major shortcoming with a recent NY Times32 heatmap.

31

Smartmoneys map of the market is updated with a 15 minute delay. The site is here:
http://www.smartmoney.com/map-of-the-market/
32
The graphic concerns the performance of the economy under different Presidents and it can be seen
here http://www.nytimes.com/interactive/2008/10/18/business/20081019-metrics-graphic.html

15

Data Visualization

By: Taggert J. Brooks

A recent bad day on Wall Street is captured by the following33.

It is possible to produce tree maps of your own, whether through Microsoft Researchs
excel add-in34 or the use of IBMs web software ManyEyes.35 There are several examples
33

These data come from http://www.uie.com/brainsparks/2008/09/30/seeing-red-smartmoneycoms-mapof-the-market/


34
Microsoft provides an AddIn for Treemaps. http://www.gilsmethod.com/node/81

16

Data Visualization

By: Taggert J. Brooks

of data you may have which could be represented by a treemap. Lets say you are
working on a project which is looking at students choice of major. The hierarchy from
top down could be:
CollegeMajornumber of students
So the number of students determines the size of the box for each major. Then the
majors are collected within the larger box of the college within which they are offered.
The boxes could be colored by many different things, for example, lets say you were
trying to get a sense of how many students change their major and what the change it
to. You could then color the boxes by the percentage of the people in that major who
have always had that major, or by the percentage that changed to that major within the
last year.
Another example could be looking at the time students spend in different activities. Lets
say you ask them the average number of hours per week they spend doing several
things, such as studying, going to class, reading, writing, etc. Again it would be possible
for you to break these down. You could make the first level of boxes equal in size to
the average percentage of time spent in the particular activity. The next level of boxes
would involve grouping the activities into broader areas, say academic, versus non
academic. Basically any data that can be grouped through some sort of hierarchy will
make a good treemap.
Some examples of brilliant dynamic web treemaps are provided by the New York Times
article on changes in inflation36. The New York Times also uses treemaps in a recent
graphic depicting the year of heavy losses on Wall Street37.
Bump Charts
Bump charts are a good way of showing changes in rank order. Below the The New
York Times talks about the challenges which face the US and other countries on infant
mortality.38 Where would you rather have an infant born? The US or Singapore?
According to the chart Singapore. However, remember that this is measuring the
number of deaths of infants (one year of age or younger) per 1000 live births. We are
more likely than other countries to have successful preterm births, but this group is
very much at risk for early death.

35

The service is available here


http://services.alphaworks.ibm.com/manyeyes/page/Treemap_for_Comparisons.html
36
A look at recent inflation
http://www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html?scp=1&s
q=inflation%20chart&st=cse
37
http://www.nytimes.com/interactive/2008/09/15/business/20080916-treemap-graphic.html
38
http://www.nytimes.com/2009/04/07/health/07stat.html?ref=science

17

Data Visualization

By: Taggert J. Brooks

Word Clouds
Word clouds are good for representing responses to open ended questions39.
This is from the following question:
Looking ahead, which would you say is more likely - that in the country as a whole we'll
have continuous good times during the next five years or so, or that we will have
periods of widespread unemployment or depression?
A. Good times
B. Widespread unemployment or depression
C. Other, please specify
The word cloud is comprised of the responses to the C. Other, please specify answer, I
have removed the first two.
39

An easy to use web site http://wordle.net/ provides allows you to produce your own word clouds

18

Data Visualization

By: Taggert J. Brooks

There are problems with this type of presentation. First, since the responses to the
other answer were actually short phrases, we dont really capture the full phrase, but
rather the frequency of the words. As a demonstration of this problem lets say 10
people said good times and ten said bad times. Since the word times appears in
both, it will be the most frequent response (appearing 20 times) and therefore the
largest. But that doesnt tell us much about the sentiment being conveyed by the
respondents.

19

Data Visualization

By: Taggert J. Brooks

This is solved below by tying all the words of a single response together with the tilde
(~). Joining the words with a ~ like this (joined~words), allows Wordle to produce a
phrase cloud, which is a great way of visualizing responses to questions with 5 or so
categories, where a phrase represents each category. This is very easy to do in excel,
just highlight the column, do a find and replace where you put a blank space in the find
and a ~ (tilde) in the replace. Then copy and paste the text into Wordle. Done.

The other problem with this presentation is that it visually doesnt direct and steer the
eye, while making the point. Your eye wanders all over the place.
Using the question:
When you think about the property taxes you or your landlord pay on the home in
which you live and the services you receive for those taxes would you say property taxes
in Wisconsin (or your state of residence) are much too high, somewhat too high, about
right, somewhat too low or much too low?
Answers that are joined are
a. Much too high
b. Somewhat too high
c. About right
d. Somewhat too low
e. Much too low
f. Other

20

Data Visualization

By: Taggert J. Brooks

One could easily list the words by frequency from greatest to least, but word clouds are
popular because they are more than just data they are art. They invite the observer in,
even if they get a little lost in the presentation. Sometimes efficiently conveying
information is sacrificed for the visual esthetic of good design. An example where the
art matters more than some of the underlying data40

40

This graphic comes from the website http://www.pitchinteractive.com/election2008/. More artistic


visualizations can be found here: http://www.visualcomplexity.com/vc/ and Slate has an excellent collection
of artistic visualizations here http://www.slate.com/id/2197749/

21

Data Visualization

By: Taggert J. Brooks

The edge of the doughnut lists the names of donors to the 2008 presidential
campaigns. Clearly in this level of presentation you cannot read the names. However it
still gets some ideas across, like the disproportionate amount of funds raised by Obama,
relative to McCain.
Bubble Charts
Bubble charts allow you to present 3 variables in two dimensions. They are basically
traditional XY scatter plots, where the size of the bubble is proportional to a third
variable. In the case below the scatter plot represents the unemployment rate and
foreclosure rate for each of the Wisconsin counties in the 7 rivers region, and the size
of the bubble is proportional to the population of the county. It is a static presentation
for one year, 2007.

UnemploymentRate

7RiversRegion2007
8
7
6
5
4
3
2
1
0

JacksonJuneau
LaCrosse
Monroe
Trempealeau
Vernon
0

0.002

0.004

0.006

0.008

0.01

ForeclosureRate

Another example, which highlights the problem with too many colors competing for
attention can be found below. In example A the mind gets lost, whereas example B does
a good job of highlighting with context the data if the orange circle.41

41

http://charts.jorgecamoes.com/is-data-visualization-useful/

22

Data Visualization

By: Taggert J. Brooks

Dynamic bubble charts allow you to plot the above, for different years, and then you can
watch the data change over the years. Ive produced some examples of the foreclosure
data to give you another idea for presenting the data42.
One of the best examples of dynamic bubble charts can be found at Gapminder.43 How
would you insert them into presentations? In the past I have posted them to a webpage,
and rendered them separately, or within powerpoint. Obviously this type of
presentation is not possible (currently) in a written report. I imagine that technology is
not far behind, as you could imagine Amazons kindle bridging the gap.
These are beautiful graphic from the New York Times44, but they might be difficult for
you to re-create, though they should get you thinking how data can be presented so
graphically pleasing and at the same time informative.
Presenting data in a written format requires different techniques than presenting the
same data orally. You have more time in a written piece for the user to dig into the
data, the graph/chart can be more complex as the NYtimes pieces are.
In the case of a power point, keep it simple and active. A science meets art, as in the
case of graphs and design. It is important to realize there will be differences. There is
less likely to be an objective standard. Some arguments will be over design, and some
over the content. Always ask yourself who your audience is, what the point of the graph
is and if your design is in fact conveying what you want it to45. The following represents
some important differences in preferences, but also important differences in terms of
information presented. Some other tips can be found at the links46
Dynamic/Interactive Graphs
These graphs can be dynamic in the sense that they are constantly updated and changing
either due to the influx of new data or interactive manipulations by the viewer.

42

http://www.uwlax.edu/faculty/brooks/prof/charts/foreclosure.htm and
http://www.uwlax.edu/faculty/brooks/prof/charts/foreclosure-state.htm
43
http://googlegadgetsapi.blogspot.com/2008/06/spreadsheet-gadgets-free-dynamic-data.html
http://code.google.com/apis/visualization/documentation/gadgetgallery.html
44
Movies. http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.html
NY Times on spending http://www.nytimes.com/interactive/2008/09/04/business/20080907-metricsgraphic.html Drug admts
http://www.nytimes.com/2008/06/14/opinion/14blow.html?_r=3&oref=slogin&oref=slogin&oref=slogin
45
http://sethgodin.typepad.com/seths_blog/2008/07/the-three-laws.html
http://sethgodin.typepad.com/seths_blog/2008/07/bar-graphs-vs-p.html
http://peltiertech.com/WordPress/2008/07/12/bar-graphs-vs-pie-charts/
http://www.perceptualedge.com/blog/?p=247
http://blog.xlcubed.com/chart-rules-as-simple-as-possible-but-not-any-simpler/
46
http://www.macworld.com/article/134708/2008/07/chartsandgraphs.html?t=103
http://www.giantflightlessbirds.com/workshops/better_graphs.pdf
some excel tips http://charts.jorgecamoes.com/category/how-to-and-tips/
http://services.alphaworks.ibm.com/manyeyes/app and another link
http://www.decisionsciencenews.com/?p=475

23

Data Visualization

By: Taggert J. Brooks

Data Visualization in Seminars/Talks/Presentations.


When the audience is in front of you rather than at home in front of their computer,
you are responsible for grabbing their attention and keeping them awake.
Here is an example of the principle of simplicity in the presentation of data in a
lecture/talk/seminar. The chart below contains three values: The percentage of water in
the body, the brain and the blood. Put yourself in the shoes of the audience if you saw
this chart. Interesting? Mind numbing?

PercentWater
90
80
70
60
50
40
30
20
10
0
body

brain

blood

Now what if I presented these same three pieces of data in three different power point
slides?

24

Data Visualization

By: Taggert J. Brooks

25

Data Visualization

By: Taggert J. Brooks

We could present the boring bar chart. Its simple, easy to understand, but not visually
stimulating. It is more data dense than the three slides, yet I think you will agree the
three slides would have a bigger impact in a presentation. They engage the audience
visually in a way the bar chart does not, giving the data a bigger impact. The slides came
from the award winning presentation entitled Thirst47.
Another must see slide presentation entitled Death by Power Point48 is available at
slideshare.com. Garr Reynolds also provides a good section of his book on Presentation
Zen through his blog where he details the 4 principles of design: Contrast, Repetition,
Alignment, and Proximity49.
Contrast

47

Thirst won the 2008 award for the Worlds Best Presentation from Slideshare.com
http://www.slideshare.net/jbrenman/thirst
48
Slideshare has several good presentations on how to present. Death by PowerPoint
http://www.slideshare.net/thecroaker/death-by-powerpoint and Presenting With Text
http://www.slideshare.net/girba/presenting-with-text
49
Part of Chapter 6 can be downloaded here http://www.presentationzen.com/chapter6_pages.pdf

26

Data Visualization

By: Taggert J. Brooks

Repetition

Alignment and Proximity

27

Data Visualization

By: Taggert J. Brooks

When thinking about PowerPoint design think about other technology. What do we
love about Apple? Simple design. What do we love about Facebook? The design and
interface is much cleaner than most MySpace pages, though sadly that is changing50.
Google, redefined simple and clean, and I am convinced that it helped fuel their early
success. Did I mention I think simplicity is important? Avoid all of the visual crap that
Microsoft seems to think is important.
Good presentations are about more than just good slide design. They are also about
being a good speaker and telling a good story. How do you learn this? Watch a few
great presentations. Pay attention to how they interact with the audience, how theyve
50

See this article http://www.readwriteweb.com/archives/is_facebook_becoming_myspace.php

28

Data Visualization

By: Taggert J. Brooks

organized their thoughts. A great presentation by Hans Rosling can be found in the link
below51. In fact most of the TED talks are useful examples of good succinct
presentations5253.
Some general principles of slide design by Garr Reynolds at Presentation Zen can be
found at the link54. He makes the important point that slides should have a high signal to
noise ratio55.
Nancy Duarte of Duarte Design, responsible for designing some of the best TED talks
and Al Gores An Inconvenient Truth provides a wonderful webinar on using
powerpoint56. Nancy also has an excellent book entitled Slide:ology.57
A link to some insights on the presentations of Steve Jobs58.
And please no bullet points59.

51

http://www.youtube.com/watch?v=hVimVzgtD6w
http://www.ted.com/
53
Additional notes on good presentation organization can be found here:
http://www.extremepresentation.com/
54
http://www.presentationzen.com/presentationzen/2008/08/learning-from-the-design-around-youikea.html
55
http://www.presentationzen.com/presentationzen/2007/03/a_few_weeks_ago.html
56
http://www.vizthink.com/blog/2008/06/18/webinar-creating-powerful-presentations-with-nancy-duarte/
57
http://www.amazon.com/slide-ology-Science-CreatingPresentations/dp/0596522347/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1238982954&sr=8-1
58
http://images.businessweek.com/ss/09/09/0929_jobs_presentations/1.htm
59
http://aralbalkan.com/1286
52

29

Data Visualization

By: Taggert J. Brooks

Finally, lest you think there is no fun in data visualization, here are some funny graphs60.
Some Dos and donts
I hate to give you a list of things to do and things not to do because as with any rules,
there are times when they should be broken. However, by giving you some rules, you
might make sure and only break them when you have good reason to.
Dont
Use 3-D graphics in excel
Use Microsoft clip art
Use a powerpoint design template
Read your presentation
Use bullet points

Do
Use Pictures
Use repetition in your design
Practice/rehearse presentation
Keep each slide to one idea

References and Endnotes


Some useful links to data visualization blogs and leading thinkers in the infoviz world.:
http://junkcharts.typepad.com/
http://www.visualcomplexity.com/vc/
http://www.edwardtufte.com/tufte/
http://www.perceptualedge.com/
http://infoclarity.blogspot.com/
http://eagereyes.org/
http://charts.jorgecamoes.com/
http://visualizeit.wordpress.com/
http://www.visualizingeconomics.com
http://www.juiceanalytics.com/writing/
Presentation Related Blogs
http://blog.duarte.com/
http://www.presentationzen.com/presentationzen/

60

http://graphjam.com/

30

Data Visualization

By: Taggert J. Brooks

Duarte, N. (2008). Slide:ology: The Art and Science of Creating Great Presentations: O'Reilly.
Few, S. (2004). Show Me the Numbers: Designing Tables and Graphs to Enlighten (1st ed.). Oakland, CA:
Analytics Press.
Few, S. (2006). Information Dashboard Design: The Effective Visual Communication of Data (1st ed.). Beijing ;
Cambride [MA]: O'Reilly.
Reynolds, G. (2008). Presentation Zen: Simple Ideas on Presentation Design and Delivery. Berkeley, CA: New
Riders.
Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). Cheshire, Conn.: Graphics Press.
Tufte, E. R. (2003). The Cognitive Style of PowerPoint. Cheshire, Conn.: Graphics Press.
Tufte, E. R. (2003). Envisioning Information (9th printing, Aug. 2003. ed.). Cheshire, Conn.: Graphics Press.
Tufte, E. R. (2006). Beautiful Evidence. Cheshire, Conn.: Graphics Press.
Tufte, E. R. (2007). Visual Explanations: Images and Quantities, Evidence and Narrative (8th printing, with
revisions, June. 2007. ed.). Cheshire, Conn.: Graphics Press.

31

Data Visualization

By: Taggert J. Brooks

Appendix: TIPS for Excel 2007


How to change the axis of a chart to the logarithmic scale.
From http://office.microsoft.com/en-us/excel/HP030656791033.aspx
Make changes to the scales of value axes
1. On a chart sheet or in an embedded chart, click the value (y) axis that you want to
change.
2. On the Format menu, click Selected Axis.
3. On the Scale tab, do one of the following:

To change the number at which the value (y) axis starts and ends, type a
different number in the Minimum box or the Maximum box.

To change the interval of tick marks and gridlines, type a different number in
the Major unit box or Minor unit box.

To change the units displayed on the value (y) axis, click the units that you
want or type a numeric value in the Display units list.
To show a label that describes the units expressed, select the Show display
units label on chart check box.
Tip If your chart values consist of large numbers, you can make the axis text
shorter and more readable by changing the display unit of the axis. For
example, if the chart values range from 1,000,000 to 50,000,000, you can
display the numbers as 1 to 50 on the axis and show a label that indicates that
the units express millions.

To change the value (y) axis to logarithmic, select the Logarithmic


scale check box.

To reverse values so that you can flip bars or columns or other data markers,
select the Values in reverse order check box.

32

Data Visualization

By: Taggert J. Brooks

How to use the Histogram add-in in Excel

http://support.microsoft.com/kb/214269
SUMMARY
This step-by-step article describes how to create a histogram with a chart from a sample set of data. The Analysis ToolPak that is included with Microsoft Excel includes a Histogram tool.

Back to the top

Verify Installation of the Analysis ToolPak


Before you use the Histogram tool, you need to make sure the Analysis ToolPak Add-in is installed. To verify whether the Analysis ToolPak is installed, follow these steps:

1. In Microsoft Office Excel 2003 and in earlier versions of Excel, click Add-Ins on the Tools menu.
In Microsoft Office Excel 2007, follow these steps:
a. Click the Microsoft Office Button, and then click Excel Options.
b. Click the Add-Ins category.
c. In the Manage list, select Excel Add-ins, and then click Go.
2. In the Add-Ins dialog box, make sure that the Analysis ToolPak check box under Add-Ins available is selected.
ClickOK.
NOTE: In order for the Analysis ToolPak to be shown in the Add-Ins dialog box, it must be installed on your computer. If you do not see Analysis ToolPak in the Add-Ins dialog box, run Microsoft Excel
Setup and add this component to the list of installed items.

Back to the top

Create a Histogram

1. Type the following in a new worksheet:

A1: 87

B1: 20

A2: 27

B2: 40

A3: 45

B3: 60

A4: 62

B4: 80

A5: 3

B5:

A6: 52

B6:

A7: 20

B7:

A8: 43

B8:

A9: 74

B9:

A10: 61

B10:

2. In Excel 2003 and in earlier versions of Excel, click Data Analysis on the Tools menu.
In Excel 2007, click Data Analysis in the Analysis group on the the Data tab.
3. In the Data Analysis dialog box, click Histogram, and then click OK.
4. In the Input Range box, type A1:A10.
5. In the Bin Range box, type B1:B4.
6. Under Output Options, click New Workbook, select the Chart Output check box, and then click OK.
A new workbook with a Histogram table and an embedded chart is generated.

33

Data Visualization

By: Taggert J. Brooks

Based on the sample data from step 1, the Histogram table will look like the following table:

A1: Bin

B1: Frequency

A2: 20

B2:

A3: 40

B3:

A4: 60

B4:

A5: 80

B5:

A6: More

B6:

And, your chart will be a column chart that reflects the data in this Histogram table.
Excel counts the number of data points in each data bin. A data point is included in a particular data bin if the number is greater than the lowest bound and equal to or less than the greater bound for the data bin.
In the example here, the bin that corresponds to data values from 0 to 20 contains two data points, 3 and 20.
If you omit the bin range, Excel creates a set of evenly distributed bins between the data's minimum and maximum values.
NOTE: You will not be able to create the Histogram chart if you specify the options (Output range or New worksheet ply) that create the Histogram table in the same workbook as your data.

34