Infographics course book overview

Infographics
course book
Version 1.3 (21 November 2014)
Free University of Bolzano Bozen – Paolo Coletti
Introduction
This book contains the infographics course’s lessons held at the Free University of Bolzano Bozen.
This book is in continuous development, please take a look at its version number, which marks important
changes.
Disclaimers
This book is designed for novice data analyzers. It contains oversimplifications of reality and many technical
details are purposely omitted.
Table of Contents
INTRODUCTION ....................................................... 1 2.5. SCALE VERSUS SCALE VARIABLES ........................ 10
TABLE OF CONTENTS ................................................ 1 3. SPECIFIC CHARTS ........................................... 12
1. BASIC NOTIONS ............................................... 2 3.1. DATA MAP ................................................... 12
1.1. FUNDAMENTAL SCOPES .................................... 2 3.2. TIME SERIES .................................................. 14
1.2. VARIABLES ..................................................... 3 4. MISLEADING FACTORS ................................... 18
1.3. SCALES .......................................................... 3 4.1. DISTRACTIBLE ELEMENTS ................................. 18
2. TRADITIONAL GRAPHS ..................................... 5 4.2. ELEMENTS IN THE WRONG PLACE ...................... 18
2.1. ONE NOMINAL VARIABLE .................................. 5 4.3. WRONG PROPORTIONS BETWEEN DATA AND SIZES 19
2.2. ONE SCALE VARIABLE ....................................... 6 4.4. DIFFERENT UNITS OF MEASURE ......................... 20
2.3. NOMINAL VERSUS NOMINAL VARIABLES ............... 8 4.5. NOT STARTING FROM ZERO .............................. 21
2.4. NOMINAL VERSUS SCALE VARIABLES .................... 9 4.6. TWO AND THREE DIMENSIONAL GROWTH ........... 21

Paolo Coletti Infographics course book
1. Basic notions
1.1. Fundamental scopes
Every time we need to represent anything, being a picturesque view through a painting, numbers through a
table or, as it will be the case in this book, numbers through a picture, we must never forget what is the
scope of our work. To help us in this task, the following questions can give good hints. They can be used
every time we are faced with a representation to point out the bad aspects and to find out possible
corrections and improvements.
What is represented?
Representing something without having a prior knowledge of what is going to be represented is clearly
pointless, but there are numerous examples of representations built up only to impress the viewer, without
any intention of really display something.
The topic which is displayed must be clear in the designer’s mind and also the point which wants to be
underlined must be clear. All the image must be dedicated to these two tasks without adding worthless
extras and without depriving these two points of any of their aspects for other secondary scopes.
Is something misleading?
The most widespread problem in data visualization are misleading graphs, which deliberately or by
ignorance display information in a wrong way to attract the viewer’s attention to the scope of the
representation in an unfair way. The typical example of error are wrong proportions, displaying elements of
the picture not in relative scale to each other. This may seems a childish trick, easy to discover; however,
when it is done using multidimensional pictures it needs a careful eye to be spotted, a can be seen in
section 4.
Is something useless?
Too many graphical representations incorporate elements which do not have anything to do with the main
scope, with the only task to embellish the graphics. Very often these elements distract the attention, when
not moving the viewer’s focus completely away. In some cases they make parts of the representation
incomprehensible, as can be the case when fancy fonts are used to write labels and numbers.
Is something missing?
Very often a graphical representation needs also text and numbers. Even though most of it can be stored in
the caption, it is necessary that axes contain the units of measure and their scale, that colors contain a
proper legend and that, in case useful information is omitted, it be explicitly cited and explained.
A scaleless or unitless graph is useless even when all elements to compare are in scale, any difference
which seems huge can instead be irrelevant. A representation with missing relevant details leads
immediately the suspect that the designer wanted to hide something.
Last but not least a time indication is fundamental. Data will change in time while the representation stays
the same (unless it is fully interactive) and can be looked at even many years after. With a missing time
indication the representation is worthless, it can be referred to right now as well as to 3 millennia ago.
Can other information be added?

This question has two aspects. The positive one is checking whether the graph can support other useful
details to be stored inside without driving attention away from the main points and without decreasing its
comprehensibility. The negative one is making sure that no other useless information is stored inside the
graph, better keeping it simple than overloading it with useless elements.
Page 2 of 22 Version 1.3 (21/11/2014)
Infographics course book Paolo Coletti
1.2. Variables
We follow Stevens theory1 and we distinguish the data that we have in three big categories, which are
treated in different ways both by statistics as well by graphical representations.
Scale variables, also called numeric variables, are data expressed in a numerical format, be it continuous or
discrete, on which mathematical operations make sense. They are usually physical measures, such as time,
age, weight, distance in Kilometers, length, number of children, GDP.
Nominal variables, also called categorical variables, are data which represent categories. No mathematical
operation makes sense on them, even though sometimes numbers are used to identify the categories.
Their only scope is dividing the population into categories. For example, country of residence, sex, degree
course. Even though scale variable could divide the population into categories, the fact that they have
many possible values, even infinite values when the variable is continuous, using them to divide the
population into categories would result in a huge number of categories, often with one or zero cases in
each category.
The border between nominal and scale variables is however not so straight. And in the midway we find
ordinal variables, which define only an order on the thing they are representing. For example, the arrival
position in a running competition is clearly ordinal, as it defines the order but no mathematical calculation
can be done (we cannot say that the sixth one is 3 times the second one). A typical example is the 5‐
elements Likert scale used in questionnaires, a set of predetermined answers going from very negative,
indicated with 1, to very positive, indicated with 5. It is clear that answering 4 is always more positive than
answering 2, but we cannot say that it is twice as positive.
Ordinal variables can always be used as nominal variables, as they have a limited number of values and
they divide the population into categories. Sometimes, when they have several values and when the
numbers user to represent the order can somehow also induce a mathematical meaning, they are used as
scale variables.
1.3. Scales
Unless extravagant solutions are absolutely necessary, there are two scales which are typical in
representations of numerical data.
The linear scale is the standard choice, every interval represents a distance which is proportional to its
length, with this proportion fixed for the entire graph. It is often a good idea to include also the zero level in
this scale, in such a way that the viewer is not mislead into thinking that apparently huge differences are
really so big, as in section 4.5. When the zero level is so far away and it is not useful for the problem (for
example, because the problem itself concentrates on the differences and not on the absolute values), it is
necessary to render very evident the fact that the scale does not start from 0, using graphical tools or text.
The alternative scale is the logarithmic scale, where each interval represents a distance which is
exponentially proportional to its length. In the logarithmic scale the logarithm of the values are displayed,
but on the axis still the original value is indicated, not its logarithm. Given the features of logarithm, no
value can be zero nor negative, as its logarithm does not exist. Traditionally logarithmic scale in base 10 is
used, as it is more human comprehensible, but it is equivalent to use any other base such as 2 or . It is vital
to indicate on the axis at least three values, to give the viewer the correct impression that numbers are
growing exponentially. In logarithmic scale the usual starting value is 1, as 0 is not representable.

1
Stevens, S. S. (1946); On the Theory of Scales of Measurement; Science, vol. 103, n. 2684, pages 677–680.
Version 1.3 (21/11/2014) Page 3 of 22
Paolo Coletti Innfographics course bookk

Page 4 of 222 V
Version 1.3 (2
21/11/2014))
Infographics course boo
ok Paolo Coletti
P i
2. Traditiona
al graphs
How one orr several variiables can be
e graphically visualized depends stron
ngly on the vvariable type
e.
2.1. One
e nomina
al variablle
To represennt one nomin nal variables there are tw
wo basic disttinct choices and a bunchh of variation
ns which aree
technically identical.
The first choice is the column plot, sometimes iimproperly ccalled histogram. It repreesents on th he horizontall
axis the variable’s categgories and on
n the vertica l axis the frequency of ca
ases in each category.
Millions of p
patent applicattions in Japan, US, Europe
between 19988 and 2007, sou urce WIPO. Number of pparticipants by race that attennd the workshop on Social Media
gcenterode.woordpress.com.
in 2011, source http://lsuag
The second
d choice is the
t pie chart. It depictss a circle divvided into sllices, each oone proportional to thee
number of cases in thhat categoryy. Comparingg it to the column plot we observve that here e the visuall
information
n on the abssolute numbber of cases is lost, while strong emmphasis is puut on the pe
ercentage off
cases.
The third chhoice is the radar graph, which is a good choice e when the n number of ccategories is at least fivee
and when tthey have a ccircular meaning, such aas months, w week’s days, hours, angu lar directionns such as N,,
NE, E, SE, ettc. Since it iss considered to be a veryy fancy graph
h, it is often a
abused and applied also to variabless
which are nnot circular at all, especiaally in marketting dimensions.
Mon
500 1,4%
400
1,2%
Sun 300 Tue
1,0%
200
100 0,8%
0 0,6%
Sat Wed
d 0,4%
0,2%
0,0%
Fri Thu Apr
A May Jun Jull Aug Sep

Number of sh
hops open per w week day (invented Liine plot of perccentage of peop
ple who use an App more than n 100 times a
data) . month with h respect to total Apps’ users, source Flurry A
Analytics
Possible varriations of th
he previous ttwo basic typpe are:
• the bar p
plot, a versio
on of the colu
umn plot witth horizontal bars;
• the line plot, which consists in a column plo t where a lin
ne connects the top of eeach column. It must nott
be confuused with a mathematical graph, as on the horizzontal axis there is not aa numerical variable butt
Version 1.3 (21/11/2014
4) Page 5 of 22
2
simply categories. Special care must therefore be taken when using it and It is best suited when an
ordinal variable appears on the horizontal axis, especially if it is a time related variable;
• the area plot, which is identical to the line plot but has the area below the line colored. This is many
times a misleading element, as it gives the impression that values between two categories really exist
(they do not, on the horizontal axis we have only categories not a continuous variable);
• all the three dimensional versions of these graphs are also commonly used. However, unless there is a
strong necessity for an extra dimension, this is only extra graphics which moves the focus away from the
main point and confuses the viewer.
2.2. One scale variable

Two graphs are used to represent the distribution of one scale variable, both requiring grouping
procedures.
The most famous is the histogram. It is built deciding a priori a subdivision into intervals of the possible
variable’s values and then counting how many cases falls into each interval. The result obviously depends
on the choice of intervals, which should be chosen wisely, as too few intervals would result in a useless
histogram, while too many intervals would produce an up and down histogram with many 0 cases or 1 case
intervals.

30

20

10
2
0

$1,000
$2,000 $3,000 $4,000 $5,000 $1,000 $2,000
$3,000 $4,000 $5,000
Histogram with 5 intervals for 100 cases (invented data) Histogram with 50 intervals for 100 cases (same invented data)

Page 6 of 22 Version 1.3 (21/11/2014)
ok Paolo Coletti
P i
12
0
$1,000 $2,000 $
$3,000
$4,000 $5,000
Histogram
m with the first iinterval 3 timess wider (same invented data).
Histogram witth 15 intervals ffor 100 data (sa
ame invented
There are 8 cases in the first box, 7 in thhe second one e
even though itss
datta)
are
ea looks much ssmaller.
Intervals must have thee same lengtth. For exam mple, if we buuild an histogram with aan interval 3 times widerr

than the otthers it contaains the case
es which shoould be in 3 intervals and thus it is hhigh as the ssum of threee
intervals, so
omething wh hich is clearlly misleadingg. Moreoverr, it is also 3 times widerr, which give
es the visuall
wrong imprression of an bigger impo ortance for thhat part.
Less famous, but in som me cases much more effiicient, is the e boxplot. It iis a visual reepresentation n of the two
o
central quaartiles througgh a rectanggle cut by a black line representing the mediann. It is thus immediately
i y
evident where the centtral 50% of tthe values’ ddistribution iis. Then it de epicts two TT‐shaped whiskers which h
usually indiccate where tthe central 9 90% of the diistribution lie es. Other values, called ooutliers or exxtremes, aree
indicated wwith symbols..
B
Boxplot of dailyy log returns of 3M company qquoted at NASD
DAQ in 2012, source www.porttfolioprobe.com
m.
One scale vvariable has also anotherr family of reepresentatio ons which deepicts a com mpletely diffe

erent aspect..
Whenever w we are interested in dispplaying the vvalues case b by case and n not the distrribution of all the values,,
we can usee the column n plot (with a all its variatiions) or the radar plot w
with cases innstead of cattegories and d
variable’s vaalues instead
d of frequency of cases. TThis kind of representatiion is perfecttly equivalen nt, also from
m
the visual p
power’s poin
nt of view, to writing aa table with the numeric values andd works onlly when thee
number of ccases is smalll. Very oftenn in this casee the pie charrt is totally in
nappropriatee.

Version 1.3 (21/11/2014
4) Page 7 of 22
2
Country GDP p
per capita (USD) in 2011
GDP pe
er capita (USD) in 2011
United Statees 14.991.300.0
000.000.000
1,6E+16
China 7.203.784.0
000.000.000
1,4E+16
Japan 5.870.357.0
000.000.000 1,2E+16
1,0E+16
Germany 3.604.061.0
000.000.000 8,0E+15
France 2.775.518.0
000.000.000 6,0E+15
4,0E+15
Brazil 2.476.651.0
000.000.000 2,0E+15
0,0E+00
United Kingd
dom 2.429.184.0
000.000.000 Unite
ed China Japa
an Germany Frannce Brazil United Italy
State
es King
gdom

Italy 2.195.937.0
000.000.000

Month
Month
Avverage monthlyy Average month
A hly Jan
preccipitation (cm) in precipitation (cm
p m) Dec
7
Feb
6
Bozen in Bozen 5
4
Average montthly
Jan 1,33 Jul 6,27 Nov
3
Mar precipitation
2 (cm) in Bozeen
Feb 1,06 Ago 4,9 1
Oct 0 Apr
Mar 1,54 Sep 3,79
Apr 3,37 Oct 4,6 Sep May
May 4,2 Nov 3,51
Ago Jun

Jun 6,04 Dec 1,53 Jul
2.3. Nom
minal verrsus nom
minal variiables
The represeentations of two nomina al variables aare variants of the one nnominal variaable graphs.. Usually thee
one with thhe more cateegories is displayed as beefore, while the categories of the othher one are represented d
side by sidee, line by linee using different colors ((in case line or radar gra
aphs turns o ut to be apppropriate) orr
also one abbove the otheer, when it m makes sensee to sum the categories a and thus givee also an info
ormation onn
the sum forr each catego ory of the first variable.
In this casee it makes seense to use a 3D graph,, where the second nom
minal variabl e is represented on thee
third dimen nsion. Howevver, still the iinformation on the exactt proportions is lost.
25
20
15
10
0
Enterntainm
ment Games Lifestyle News Sociall networking
iP
Phone Android
d
Average seession frequenccy per month off smartphone uusage by Early‐stage e
entrepreneuriall activity in the UK by age grou
up,
operating systeem, source Flurrry Analytics source “GEM G Global adult poopulation surveyy (APS) 2002‐20011”
Page 8 of 222 V
Version 1.3 (2
21/11/2014))
ok Paolo Coletti
P i
20
15
10
5
0
Androiid
iPhone
European, USS and Japanese patent costs, so
ource Mejer, M M., and Van Average sessio
on frequency peer month of sm martphone usagge by
Pottelsbeerghe (2009) “Beyond the proh hibitive cost of patent opeerating system, source Flurry A
Analytics
proteection in Europe
e”
Sometimees a stacked areea graph can be really meaninggful. Smartphon by operating sysstem, source www.gartner.com
ne shipments b m
2.4. Nom
minal verrsus scale
e variablles
Also the no
ominal versu
us scale reprresentations are variantss of the scale variable rrepresentatio
on, with thee
nominal varriable’s categgories depictted side by sside or one o over the other. For histoograms a spe ecial warningg
is necessaryy: when the number of ccases in eachh category is strongly diffferent, rescaaling the verttical axes forr
visibility purposes can b
be misleadingg, while not doing it can squeeze a distribution too the horizon ntal axis.
Version 1.3 (21/11/2014
4) Page 9 of 22
2
Boxplots of d
daily log return
ns at NASDAQ in
n 2012 for diffeerent Histo
ograms, with ve
ertical axes resccaled, of heights by sex of CSE
commpanies, source www.portfolio oprobe.com students in 2
2009, source htttp://www.wusstl.edu.
2.5. Scalle versuss scale va

ariables
The traditio
onal way to represent tw wo scale varriables is the
e scatterplot, a graph w
with two num merical axess
where each h case is reprresented witth a symbol. Whenever tthe cases are e in horizont al axis value es’ order andd
whenever tthis order makes sense, a mathemattical graph ccan be built joining the symbols witth lines. Thiss
lines must n not be confu used with the regressionn line (or reggression curvve in case off a non‐linea ar regressionn
model), wh hich is a staatistical tool commonly used in combination with w a scatteerplot to fo
orecast non‐‐
observed vaalues.
Scatterplot o
of number of meedicines used d
daily per 1000 innhabitants and average cost o
of daily medicinne dose. Cases a
are regions,
source “Relazzione sanitaria –– Provincia autonoma di Bolza
ano – 2005”
Page 10 of 222 V
Version 1.3 (2
21/11/2014))
ok Paolo Coletti
P i
Scatterp
plot with lines o
of photosynthessis in a biologicaal Scatterplot with regresssion line , source
N 978‐953‐307‐2283‐8.
experiment, ssource chapter 3 of book ISBN http://markwadsworth.blogsspot.it, data fro om OECD.
When the n number of sccale variable es grows to tthree, surface plot is borrrowed from m mathematics to displayy

the effect o
of two variaables on a th
hird one. It is a tridimensional verssion of scattterplot (whicch would bee
incompreheensible in thrree dimensio ons) with linees and thus w works only wwhen the varriable on the vertical axiss
is a functionn of the oness on the horizontal axes aand if the prroper rotated d view is adoopted.
Two‐input on
ne‐output fuzzyy controller data
a, source Fully Logic Circcle’s size repressent country’s ppopulation (bigg red circle is
Toolbox exaamples for Matlab. Chin
na, big light bluee one is India, bbig yellow one iis USA, oldest
peeople are from Japan, poorestt country is Con ngo, richest
country is Luxembourg), sourcce www.gapminder.org.
2.5.1. Bubble chartts

When the ttwo variablees on the ho
orizontal axees do not sh
hare the sam
me unit of m
measure, as for examplee
population and GDP, itt is more effective to u se another representatiion, known as bubble chart. In thiss
representattion one of tthe scale variables is dispplayed increaasing the size of bubbless, either the diameter orr
the area (evven though aa two dimensional distorrtion comes into play, se ee section 4.66). Moreover, colors can n
be used to rrepresent a ffourth nomin nal or scale vvariable.
Website ww ww.gapmind der.org has a tool to d isplay severral bubble charts whichh move with
h time, thuss
introducingg also anotheer dimension. However, i t works onlyy on their datta.
In order to produ uce bubble charts on perso onal data Excel or r Google Developerss
(https://devvelopers.goo art/interactivve/docs/gallery/bubblecchart) can bee used.
ogle.com/cha

Version 1.3 (21/11/2014
4) Page 11 of 22
2
3. Specific ch
harts
Several fieldds make usee of particular variables for which data
d have we
ell‐known prroprieties, which
w can bee
used to dissplay particu
ular graphs which
w turns out to be more
m approppriate and m
more undersstandable byy
expert eyess then standaard ones.
3.1. Datta map

When one o of the involvved variable is a geograpphic location it can theorretically be cconverted into two scalee
variables ussing its geoggraphic coorrdinates. Hoowever, inste
ead of using
g methods ffor three or more scalee
variables, w
we can take p profit of the existence off a well‐knowwn map, imm mediately reccognizable b by the views,,
and depict the chart on n the map itself. If the oother variable is a scale variable, wee can build a color scale,,
while if thee other variable does not exist we caan build practically a sca atterplot usi ng symbols to mark thee
geographic locations bu ut drawn abo ove a map.
Source A
Annals of Intern
nal Medicine, vo
ol. 133, n. 2, paages 161‐162. Black
k lines are choleera cases 19 Au
ug to 30 Sept
1854 as drawn byy John Snow, so ource ISBN
97811130088281.
Other elemments can bee added to data

d maps, using the faact that we can display a proportio
on using thee
element’s ssize as well aas a directio
on using the map’s orien ntation. A typical exampple is the flows of trafficc
from variou us points of tthe map. Wh henever dispplaying relations among e elements of the map is d displayed on n
the map itsself, we must unavoidably cut those elements w which are too o small, as ootherwise the e number off
relations iss 2 to the power of the numbeer of eleme ents and can easily maake the en ntire picturee
incompreheensible.
Page 12 of 222 V
Version 1.3 (2
21/11/2014))
Exports by sea of French wines in 1864, source C.J. Minard 1865. 1999 volume of minutes of voice telecommunication.
Only the principal route pairs are shown to avoid
cluttering the map. Circular symbols on capital cities
represent the country's total annual outgoing traffic,
source http://mappa.mundi.net and
www.telegeography.com
3.1.1. Statsilk software

A very efficient free software which gives the opportunity to load a map below and data from a
spreadsheet, displaying them over the map in a variety of ways including side graphs is developed by
Statsilk www.statsilk.com and comes in three packages:
 StatPlanet, a web‐based and desktop software for visualizing spatial data through interactive maps,
graphs and charts
 StatTrends, a web‐based and desktop software for visualizing any kind of data through interactive
graphs and charts
 StatWorld/StatPlanet, a web‐based and desktop software for exploring world statistics from various
international organizations through interactive visualizations.
These programs are interactive, meaning that graphs’ proprieties can be changed with a simple click and
web‐based, meaning that they can work through a browser connected to Statsilk’s website.
Version 1.3 (21/11/2014) Page 13 of 22
3.2. Time series

3.2.1. Financial chart
Financial data use scale variables time, low price, high price, open price and close price and the main
interest is usually depicting prices as a function of time. They have the property that low is always smaller
than all the others and high is always larger than all the others. Therefore they can be represented using a
candlesticks chart, where time is on the horizontal axis, the central rectangle has values open and close as
extremes (and is white or green when open is the base, black or red then close is the base) and the vertical
line goes from low to high. In this way five different scale variables can be efficiently represented together.
Sometimes traded volume or other indicators is depicted as a daily column plot below the chart, using the
same horizontal time line.
Special care must be taken when producing or observing a financial graph for missing values, either
because the security is not traded (market closed or security suspended) or because of data errors; in cases
like this it is preferable to leave a blank space if just one of the four prices is missing.
Page 14 of 22 Version 1.3 (21/11/2014)
ok Paolo Coletti
P i

Telecom Italiia daily open, close, low, high prices, source Y
Yahoo!. Horizontal axis skips SSaturdays and
Sundays w
without leaving any blank spacee, other market closed days have a blank spaace instead.
A similar representationn is typical fo
or weather, aas is shares tthe high and
d low of finanncial data, as well as thee
presence off other indicaators such ass precipitatioon and humidity.

3.2.2. Gan
ntt chart
A very alterrnative way to display a starting andd ending tim
me of an eve ent which mooves ahead (in space orr
moves on w with its planned activitie
es) is using oone dimensioon as time aand the otheer one as mo oving space,,
connecting the startingg and endingg points withh a line. The steepness o
of the line inndicates how
w fast is thatt
part proceeeding.
When dealing with projject’s planning, this grapph uses bars and takes the name of Gantt chart,, in honor off
Henry Gantt.
Version 1.3 (21/11/2014
4) Page 15 of 22
2
Gantt chart for Windowss support, sourcce www.xpisobsolete.com
Timetable of ttrains from Parris to Lyon operrating in 1880, rred line is TGV train in 1981, sources E.J. Marrey 1885 and E.R. Tufte 2001.
3.2.3. Narrative gra

aph
Figurative m
map of Fren nch army in n Russian caampaign 181 12‐1813 dra awn by C.J. Minard in 1869 is thee
fundamentaal example of o a narrativve graph. Thhis graph maanages to de epict in a veery efficient way on thee
same chart several scalle variables: the coordinnates of the army’s posittion includinng all possible branches,,
the army’s size and the weather temperature
t e. Moreover, it includes useful elem ments such as the main n
battles and the crossedd rivers. Eve
erything whi ch is relevant for the campaign is reported, with w the onlyy
exception oof the timelin ne which is o only sketche d for the rettreat and is n not in scale (as the horizzontal axis iss
in geographhical and nott temporal sccale).
Other elem
ments can bee represente ed on a figuurative map,, bearing in mind that if the horizo
ontal axis iss
dedicated tto the timeliine then the
e geographicc informationn must be squeezed in tthe vertical axis only orr
inserted thrrough other means.
Page 16 of 222 V
Version 1.3 (2
21/11/2014))

Figurative map of the men losses of the French army in the Russian campaign 1812‐1813, source C.J. Minard 1869.

A new chart of history, source Lectures on History and General Policy by J. Priestley 1788.
Version 1.3 (21/11/2014) Page 17 of 22
4. Misleading factors
4.1. Distractible elements
Any element which is not directly related to the focus of the representation should be avoided, especially
when it is only an embellishment of the chart. The typical example is inserting pictures which do not help
comprehensibility inside bar plots.
As can be seen in these two pictures, the elements added in the column bars do not help the
comprehensibility at all. They do not represent, as they could be, neither the years nor the three analyzed
variables in the second graph, where it could be much better displaying a picture related to the variable
itself. Their only task is to distract the viewer in the first one with the help of a three‐dimensional object
(see section 4.6) and in the second one from the non‐zero start of vertical axis (see section 4.5).
Source Beautiful Evidence by E. Tufte. Source Day Mines 1974 Annual Report.
In this picture instead a fancy left to right perspective effect has been introduced. Its effect on a better
comprehensibility of the data is none, while it makes comparing the bars a very difficult task and gives the
impression of a much larger increase.
Black columns represent New York State aid to localities in billions of USD,
total column represent total budget, source New York Times 1 Feb 1976.
4.2. Elements in the wrong place

The position of elements must be planned with special care, as the viewer can induce that it contains a
special meaning. In case it does not, this must be evident. For example, in one of the versions of the
worldwide famous billion dollar gram, which depicts some of the world’s expenditures on a two
dimensional scale. The graph is without any doubt very impressive and efficient in comparing some
Page 18 of 22 Version 1.3 (21/11/2014)
expenses to others. However, apart from the impact on the viewer, there is no need to use two
dimensions, one dimension is enough as the variable to be displayed is only one. The only advantage in
using two dimensions is the possibility to easily slip insert some rectangles inside other ones. However, this
automatically induce into the viewer the idea that a rectangle is part of the outside one, as it is correctly
depicted for Online Advertising as part of Worldwide Advertising Spend. However, the viewer is expected
to see the same for Iraq War, but the three rectangles are not in the total one, giving the impression of a
larger expense.

Upper part of billion dollar gram by D. McCandless, source www.informationisbeautiful.net
4.3. Wrong proportions between data and sizes

Respecting proportions between numerical data and graphical elements’ sizes is one of the basic principles
of any visual representation. In the billion dollar gram the proportion to be respected is between numerical
data and areas, but the Facebook area, which correspond to 15 billion USD, should be less than one third of
Bill Gates area, which evidentially is not as can be estimated with a simple ruler. The picture below shows
instead a billion dollar gram where no evidently wrong proportion or no wrongly placed element seem to
appear.
Version 1.3 (21/11/2014) Page 19 of 22

Upper parrt of billion dollar‐o‐gram by D
D. McCandless, source www.in
nformationisbeeautiful.net
4.4. Diffferent un
nits of me
easure
Data with a different un nit of measurre may neve r be put one e aside the otther, unless ttheir units o of measure iss
very clear o
or unless theey are prope
erly rescaledd. Apart from
m evident units of meassure’s difference, in thee
examples b below the last category contains a sshorter sample’s unit, which
w is eviddent only in the second
d
picture. In bboth cases a rescaling wo ould have beeen very appropriate.
Source Scien
nce Indicators 1
1974, National SScience Foundaation. Source New York Timees 8 August 197
78.
Page 20 of 222 V
Version 1.3 (2
21/11/2014))
ok Paolo Coletti
P i
4.5. Nott starting

g from zerro
Starting thee axis from a number which is noot zero is a legitimate choice
c whennever it is necessary
n to
o
emphasize small differeences (and using
u logarit hmic scale is not appropriate), but it must be very
v evidentt
and possibly underlined d. In the seco
ond examplee of section 4 4.1 the threee column ploots not only do not startt
from zero wwithout anyy notice nor number onn the verticaal axes, but they start ffrom a large
ely negativee
number witthout any neecessity, sincce the variabbles represen nted by the ffirst and thee third plots cannot havee
negative vaalues, and the three plotts have a diffferent scale and zero levvel. This alsoo hides the ffact that thee
first columnn of the seco ond plot is ne
egative.
4.6. Two
o and thrree dimensional ggrowth
Whenever aa single scalee variable is representedd using a two o dimensional image speecial care mu ust be takenn
in the propo ortions. The viewer is facced up with two differen nt scales: the
e vertical sizee and the are ea size. Onlyy
ontal base is kept constant, it is possiible to respe
if the horizo ect both the pproportions among data and verticall
sizes and areas. If also the horizon
ntal base of the graphiccal element is rescaled, then the area becomess
obviously p proportional to the square of the da ta, as can be e seen in the man in thee first picturre of section n
4.1, and in the two picttures below. In the firstt one, the fact that the e element is aalso a three dimensionall
barrel introduces an eveen further m misleading efffect, as the p perceived volume is the ccube of the linear size.

Source Tim
me 9 April 1979
9. Source W
Washington Posst 25 April 1978
8.
A three dimmensional growth on an abstract eleement is disp played in the e next picturre, where a column of aa

standard co
olumn is exp a three dimeensional object which clearly gives tthe impression of beingg
ploded into a
much taller and with a larger volume than the tw wo dimensio onal ones.
Much subtler is the gro owth effect o
of the next ssecond graph, which dep picts the vollume of water and fresh h
water comp pared to Earth’s volume.. The volumee’s proportio ons are prob
bably correctt, but what is misleadingg
here is thatt the vast maajority of wa
ater is scatteered on 20 K
Km of Earth’ss surface deppth, which iss exactly thee
place wheree water is neeeded. So it sshould comppared to the surface of Earth, eventuually multiplied by 20 Km m
to have a voolume, and n not to the en
ntire Earth’s vvolume as w
water will nevver flow dee ply inside Ea
arth. Lookingg
at any terreestrial globe immediatelyy shows that water is far from scarce on Earth’s ssurface.
Version 1.3 (21/11/2014
4) Page 21 of 22
2

Source New York TTimes, 19 Decem
mber 1978 Sou
urce www.80200vision.com

Page 22 of 222 V
Version 1.3 (2
21/11/2014))

Infographics course book overview

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Infographics course book overview

Uploaded by

Copyright:

Available Formats

Infographics

Can other information be added?

2.2. One scale variable

Intervals must have thee same lengtth. For exam mple, if we buuild an histogram with aan interval 3 times widerr

One scale vvariable has also anotherr family of reepresentatio ons which deepicts a com mpletely diffe

2.5. Scalle versuss scale va

When the n number of sccale variable es grows to tthree, surface plot is borrrowed from m mathematics to displayy

2.5.1. Bubble chartts

3.1. Datta map

Other elemments can bee added to data

3.1.1. Statsilk software

3.2. Time series

3.2.3. Narrative gra

4.2. Elements in the wrong place

4.3. Wrong proportions between data and sizes

4.5. Nott starting

A three dimmensional growth on an abstract eleement is disp played in the e next picturre, where a column of aa

You might also like