You are on page 1of 18

Chart format guide

This guide and sets out some principles and conventions for statistical charts. It is a version of that used in the House of Commons Library. The
first part sets out some general principles to follow for any chart and then covers the default formatting used by the Social & General Statistics
Section of the Library. Charts are by their nature more subjective than tables. A chart works on visual and different analytical levels is open to
greater interpretation. There are far more options to choose from covering types of chart, colour, size, dimensions, labelling, scales etc.
Therefore the Library has a limited number of definitive hard and fast ‘rules’ covering charts. The appendix to the guide goes beyond the
standard formatting of charts used in the Library. It is intended to be thought provoking rather than prescriptive and makes some suggestions
on further aspects of chart layout, such as when to use certain chart types, highlights the weaknesses of other chart types and looks at
combining multiple charts.

This guide draws on the work of a number of different authors, particularly:

-Edward Tufte, The Visual Display of Quantitative Information, (1987); Envisioning Information, (1990), Visual Explanations
Images and Quantities, Evidence and Narrative, (1997); and Beautiful Evidence (2006)
-Jacques Bertin, Semiology of Graphics: Diagrams, Networks, Maps, (1983)
- William S. Cleveland, The elements of graphing data (1985)

Much of this guide focuses the different elements and capabilities of Excel (2003) charts, but it does not include step-by-step instructions or any
guidance on advanced charts 9those which extend Excel’s basic capabilities)
Before After
General principles
The general principles that should be applied to the choice and Proportion of pupils in large classes (>30), England
1978-
construction of any chart are: 45%
Proportion of pupils in large classes (>30), England 1978-

0.45
40%
0.4
Accuracy -the patterns in the chart accurately reflect the 0.35 35%
0.3
underlying data 0.25 Primary
30%
Primary schools

Economy –the chart includes only those elements which 0.15


0.2 Secondary 25%

20%

display the data and those necessary to understand it. 0.05


0.1
15%
Secondary
Clarity –the patterns/values the chart depicts are as easy as 0 10% schools

1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
possible for the reader to interpret. 5%

0%
1978 1982 1986 1990 1994 1998 2002 2006

One might also add the aims of self sufficiency (the chart needs as little
Subsidies and other payments made to farmers in the
text explanation as possible) and high data density (the data elements Subsidies and other payments made to farmers in the UK
UK
of the chart take up as much of the space as possible and include the 4500 4.5
4.0
£ billion 2007 prices
Decoupled and other payments
4000
maximum amount of information per unit of area), but both are related 3500 3.5
Payments linked to production
£ billion 2007 prices

to economy and clarity. 3000


2500
Decoupled and other
payments
3.0
2.5
2000 Payments linked to
2.0
production
1500
A default Excel chart violates many of these principles. Some basic 1000
1.5
1.0
changes will make very clear improvements. Before and after examples 500
0.5
0
1 0.0
73
76
79
82
85
88
91
94
97
00
03
06

1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006
19
19
19
19
19
19
19
19
19
20
20
20

Source: Agriculture in the UK 2006 , Defra Source: Agriculture in the UK 2006 , Defra
are illustrated opposite. These steps will improve a chart, but not
always produce an ideal chart which can require extending the general
principles with some additional thought and effort. The steps listed in
this guide are necessary but not always sufficient to produce the best
chart possible. Some other suggestions are given in the appendix to
this guide.
Step-by-step impact of
First steps -what to leave out removing Excel defaults
The principle of economy means that each mark on the chart should
have some meaning, none of the elements are superfluous and ‘data 6
5
6
5
ink’ (that which shows the pattern in the underlying data) is maximised 4 4
3 Series1 3 Series1
compared to the other chart elements. This leads to the following rules 2 2

for the exclusion of specific elements of a default Excel chart (also 1 1


0 0
illustrated opposite): A B C D E F A B C D E F

-Chart area –no border around the chart


6
-Plot area –no border and no pattern (colouring in) 6
5
5

-Legend –no border 4


4
3 Series1
-Gridlines –none 3
2
Series1
2
1
-Colour elements of data series –no borders and no markers 1
0
0
A B C D E F
A B C D E F

These remove the worst of Excel’s default formatting and concentrates


the reader’s mind (and eye) on the data. The next section looks in more 6 6
detail at different elements. 5 5
4 4
3 Series1 3 Series1

Next steps –what to include 2


1
2
1
0 0
A B C D E F A B C D E F
Font –arial throughout.
Title –as with tables this needs to be descriptive, including the
what, where and when. It may also include units. Font needs to Y-axis labels moved to title and
be larger than all other text elements and in bold plot area increased
Source –needed if not given in associated table. Font smaller Frequency of categories, UK, 2007 Frequency of categories, UK, 2007,
thousands
than all other text elements and in italics. Placed at the bottom 6,000 6

of the chart 5,000


4,000
5
4
3,000 Series1
Y-axis units –to minimise the space taken up by labels include 2,000
3
2
Series1

as few digits as possible and add the magnitude in the title (ie. 1,000 1
0
1, 2, 3, etc. instead of 1,000,000, 2,000,000, 3,000,000 etc. – A B C D E F
0
A B C D E F

see opposite). Rounding and decimal places must be

2
Chart changed from column to bar to
consistent. Text size between title and source sizes, same as make categories easier to read
legend and x-axis labels. Frequency of categories, UK, 2007, Frequency of categories, UK, 2007,
thousands thousands
X-axis labels –In most cases these will be categories or dates. 6 First lot of things

These must be horizontally aligned. This may mean that Excel 5


4 Series1
Second lot of things
Series1
does not include every date (preferable to abbreviating dates to 3
2
Third lot of things

Fourth lot of things


00/01, 01/02 etc.) or it wraps long category titles. Consider 1
Fifth lot of things
0
switching from a column to a bar chart if the latter is a problem First lot Second Third Fourth Fifth lot Sixth Sixth lot of things
of lot of lot of lot of of lot of
(see opposite). Use country abbreviations for column charts. things things things things things things 0 1 2 3 4 5 6

Text size smaller than title, bigger than source, same as legend
and y-axis labels. Spending by categories, UK, 2007 Spending by categories, UK, 2007
£ billion
Axis titles -If you must include them they should also be 6
cash
6

horizontally aligned (this means including them at the top of the 5 5

£ billion cash
4 Series1
y-axis). It is frequently better to put the units in the title or 3
Series1 4
3

subtitle. This keeps all the description in one place and again 2 2
1
allows you to maximise the data plot area (see opposite) 1
0 0
A B C D E F A B C D E F

Axis ‘tick marks’ –Ensure they are set to outside (the default).
If you are using a line chart then checking the ‘value (Y) axis
Steps from default axis
crosses between categories’ on the scale tab will make better label, to an improved Spending by categories, UK, 2007,
use of the plot area. version, then label in £ billion cash
6
subtitle (preferred)
Legend –Direct labelling of lines is preferable. If you include a 5

4 Series1
legend add it into an empty part of the plot area. The defaults of 3

top/side/bottom, all squeeze the area devoted to showing the 2

1
data (see below opposite) 0
A B C D E F
Line-type –Do not select ‘smoothed’
Data labels –Avoid wherever possible. They go against the
principle of economy and your chart should not need them.
Legend moved inside plot area
which is then expanded
Colour Spending by categories, UK, 2007, Spending by categories, UK, 2007,
The default colour used by the Library is ‘House of Commons green’ £ billion cash £ billion cash
(the green option on the default range of colours). Where more than 6 6
Series1
5 5
one colour is needed use shades of this green. There are eight shades 4 4
Series1
of this green used by the Library (shown below). Chose contrasting 3
3
shades as far as possible. 2
2
1
1
0
A B C D E F 0
A B C D E F

Use colours not patterns for different categories in a line chart. Dotted
lines can be used where they add meaning (ie. for projections or breaks
3
in series). Do not use fill effects for varying chart colours as the pattern Ordering the categories improves
options look poor when printed or converted to a PDF document. the display of the distribution and
gives a ranked list on the x-axis.

Ordering categories 6 6
Most category data (types of things such as areas, groups of people, 5 5
industry, modes of transport and crimes) can be reorganised or sorted 4 4
3 3
by value in charts to help the reader better understand the data
2 2
(opposite). The result is a better idea of the distribution of the data 1 1
while the information on individual values is retained. The category axis 0 0
A B C D E F
then effectively ranks the different categories by value. The default F A C E D B

ordering should be from highest to lowest.

Occasionally category charts are not ordered and the ‘standard’ order
is given. There are a few instances where categories cannot be re-
ordered (strictly sequential categories such as social class or age
groups) and even the categories that are normally given in a set order The first chart imparts little information other than
(ethnicity, industrial classification, religion, sources of emission etc.) max, min and rough average. The frequency
can be sorted when included in a chart. distribution alternative is more informative.
90%
80% Number
30
Ordering a very large number of categories, such as local authorities, 70%
25
means you lose the category identity. There is not space to include 60%
20
50%
them in all but the largest of charts. You are left with a chart of what is 40% 15
effectively a very similar shaped curve for most distributions that are 30% 10
20%
approximately normal. This gives the reader very little information apart 10%
5

from max/min and the fact that most values are close to the median. A 0% 0
35%- 40%- 45%- 50%- 55%- 60%- 65%- 70%- 75%- 80%- 85%-
frequency distribution chart (opposite) better identifies outliers and at
least gives the reader some information they can use such as numbers
above or below or in a specific value/range.

Axis values Excel’s default ‘concertina-ing’


of dates and the correct shape
Missing dates of the data
Where a time series has missing values for certain dates or the gaps
between data varies, the chart must display this. If just places the dates SGS enquiries (thousands) SGS enquiries (thousands)
12 12
with data next to each other it violates the accuracy principle (see 10 10
opposite). 8 8

6 6

4 4

2 2

0 0
1986 1992 1996 2000 2001 2002 2003 2004 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

4
Shortened y-axis
If the data you add to a chart does not vary a great deal Excel will be
‘helpful’ and cut the y-axis, ie. it will not start at zero. This means that
underlying patterns in the data are magnified. In other words the The same data plotted first as
accuracy principle is violated. A helpful way of thinking about this is to Excel would do by default. Then
the uncut version.
quantify the degree of magnification. That show opposite (by Excel 120 120
default) is 10. In other words the variation in the chart produced by 115 100
Excel (on the left) is 10 times greater than the real variation (on the 110 80
right). This degree of magnification has been dubbed the ‘lie factor’. 1 105 60

100 40
This issue is looked at in more depth in the appendix. This gives some 95 20
suggested alternatives to cutting the y-axis and sets out what should be 90 0
done when there is no alternative. 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

1
See Edward Tufte The Visual Display of Quantitative Information, (1987), p.57
5
Appendix
Below is a collection of alternatives to ‘standard’ charts with an
illustration of how they can build on and improve the earlier basic
‘rules’. All these follow from the principles outlined at the start of the
main guide. The sections cover shortening the y-axis, size, dimensions,
the choice of chart type and expanded or composite charts. As
mentioned earlier the examples and suggestions given here are meant
to be thought provoking. The aims are to get authors to think more
thoroughly about how data should be displayed in a chart and to offer
some general guidance. The recommendations are:

Shortening the y-axis.


Cutting the y-axis violates the accuracy principle. Social Dual axes charts
Trends has a very large number of charts, but the latest edition Any dual axis chart is potentially misleading, even one with
has only one with a shortened y-axis. There are a number of combined chart types (ie. lines and bars). These charts should
alternative approaches. If none of these help and you can still therefore not be used.
justify shortening the axis then at the very least you should
indicate the axis is shortened with a . Expanded or composite charts
Expanded or composite charts are multi-chart display with the
Chart size intention that the reader makes direct comparison between the
Many of the charts we produce are much bigger than they (related) data. Because each single chart has a single series
have to be. An appropriate use of smaller charts means one there are no problems with clarity of stacked series and no
can better integrate charts with text. It also gives you greater jumble of multiple crossing lines or confusing side-by-side
flexibility about the placement of charts. columns They should be considered as alternatives to stacked
or side-by-side charts for both category and time series data.
Dimensions
The default aspect should be landscape. There are some other
times where you might want to change the aspect ratio to help
the reader better understand the data.

Chart type
Simple charts are more effective. Pie charts are less clear than
bar/column charts. Stacked charts are often less clear than a
sequence of simple line/bar/column charts. The pie chart falls
down on the principles of self-sufficiency and economy. It has
other limitations and therefore should only be used in
exceptional circumstances.
Line charts should be the default option for time series charts.

6
Shortening the y-axis
The section in the main guide gives a general introduction to the issue.
The main point is that cutting the y-axis violates the accuracy
principle.

We can not always assume that people use charts to look back at
precise values on y-axis, or even look at the values at all. Charts The overall trend is clearly down,
but by what %?
illustrate patterns and the reader’s initial focus will be on the shape
shown –the relative values. Take the chart opposite (lie factor 4); what
330
are the actual proportionate changes? With a full axis the shape of the 320
data elements give an accurate illustration of the change over this 310
period. To do this for the chart opposite you need to look back to the y- 300
290
axis to see the change in numbers and do a relatively complex sum in
280
your head to come up with the actual change over the period ((270- 270
320)/320) which is not a sum that every reader can or will do. More 260
fundamentally charts are not an effective way to get reference 250
240
information across.
1990 1993 1996 1999 2002 2005

The same principle applies to index charts which simply change the
scale on the y-axis. While they only give relative values on the y-axis it
still needs the reader to i) read across to values on the y-axis and ii)
understand what an index chart means.

It is extremely common to see trend charts with a shortened y-axis in Here both an accurate picture of the
the press. A complete y-axis can, like so many facts, get in the way of a underlying data and some detail on
good story. Some charts, such as trends in the stock market, are rarely annual variations are given
if ever shown with a full axis down to zero. Official statistics use a 350
shortened y-axis much less frequently. Social Trends has a very 300
large number of charts, but the latest edition has only one with a 250
shortened y-axis. Detail
200
320

There are a number of alternative approaches. These include: 150 300

280
-Adding more historical data. This will frequently include a 100 260

greater range of figures and put relatively small recent changes


50
in context.
0
-Add a small version of the chart as inset detail that cuts the y- 1990 1992 1994 1996 1998 2000 2002 2004 2006
axis (opposite).

7
Ditch the chart- If you have a flat series which shows nothing of
any interest you should ask yourself whether you need to
include a chart to show this at all.
-Leave it alone- Keep the chart as it is if the small changes
actually represent an important result in themselves. Try to
separate the substantively/politically important results from
those that might make a chart look more interesting or nice.
Similarly a chart that shows large variations may have no
political relevance or be of little interest to readers.
-Chart the percentage change. This is potentially as misleading
as a shortened axis. It may however help in a few cases where
the percentage change in an indicator is more important than
absolute values. The obvious example is inflation. Problems In this example the axis is cut as the observered
increase in concentrations is deemed important. In
arise with longer-term trends as over time denominators addition, the point at where it is cut is the pre-
change and a 5% increase now may be very different from a existing baseline and concentrations have never
5% change 10 or 20 years ago. Such a chart is less data dense approached zero. The cut in the y-axis is indicated
with a zig-zag line at the base
as it strips out absolute values and gives a poor indication of
changes in absolute levels over time. Journalists are not the
Monthly mean atmospheric carbon dioxide at Mauna Loa Observatory, Hawaii
only ones who confuse a reduction in a percentage increase (parts per million)
390

with a reduction in the underlying absolute value. 380

370

If none of these help and you can still justify shortening the axis 360

(ie. the change is important in actual/substantive/political terms) 350

then at the very least you should indicate the axis is shortened 340
with a zig-zag . This will help the reader and make our charts more
330
truthful. If there is no time (this can be a fairly long process) then it
320
should be mentioned in the accompanying text at very least (“note the
shortened axis which over emphasises small changes”). The example 310

opposite sets out an appropriate use and display of a shortened value 300

axis. 290
Estimated pre-industrial concentrations
280

1958 1963 1968 1973 1978 1983 1988 1993 1998 2003 2008
Source: US National Oceanographic & Atmospheric Administraton -Global Monitoring Division www.cmdl.noaa.gov/index.php

8
Dimensions and scale € 35
Excel produces a chart object in a standard size and dimensions. While
these may be adjusted to fit some additional chart elements, there is € 30
scope to do much more.
€ 25

Scale € 20
Many of the charts we produce are much bigger than they have to
€ 15
be. The pattern in the data stays the same however small the chart and
the human eye is capable of accurately perceiving very small € 10
differences (such as in letters, fonts and maps). As an illustration the
€5
charts opposite show the same data in an increasingly small area.
Each subsequent chart takes up one half the space of its predecessor. €0
The first chart already has a relatively high data density. What exactly Apr 2005 Oct 2005 Apr 2006 Oct 2006 Apr 2007 Oct 2007 Apr 2008 Oct 2008
is lost with the reduction in scale? The major loss is in text size, which
€ 35
soon becomes unreadable. The key aspects of the trend (the macro or
€ 30
global reading) are still clear in the smallest chart. The ability to read
€ 25
across to exact dates/values (micro or elementary level of reading) € 20
starts to fall off from 75% reduction and greater. There is clearly some € 15
loss in trend detail but this only really hits at a reduction in size of 94% € 10
(second to smallest). €5

€0
This all raises the question what exact message are you trying to get Apr Oct Apr Oct Apr Oct Apr Oct
2005 2005 2006 2006 2007 2007 2008 2008
across? Charts are not good ways to impart reference information.
Sometimes you do want to say something about the detail, but what € 35

resolution do you need to impart even this information? This is not an € 30

€ 25

argument to reduce all charts to a microscopic level. But an € 20

appropriate use of smaller charts allows for a much better € 15

€ 10
integration of text and charts, it gives you greater flexibility about €5

the placement of charts, reduces the likelihood of large areas of white €0


Apr 2005 Oct 2005 Apr 2006 Oct 2006 Apr 2007 Oct 2007 Apr 2008 Oct 2008

space and means you are much better able to compare multiple charts
as you can fit more on a single page/screen. 2 € 35

€ 30

€ 25

€ 20

If the text in your chart gets in the way of reducing the data area to the
€ 15

€ 10

size you want it then reduce the number of labels on each axis and
€5

€0
Apr 2005 Oct 2005 Apr 2006 Oct 2006 Apr 2007 Oct 2007 Apr 2008 Oct 2008

consider changing the title. Title text can be reduced or even removed € 35

by additional details given in accompanying text or section headings. € 30

€ 25

€ 20

€ 15

€ 10

€5

€0
Apr 2005 Oct 2005 Apr 2006 Oct 2006 Apr 2007 Oct 2007 Apr 2008 Oct 2008

€ 35

€ 30

€ 25

€ 20

€ 15

€ 10

2
For a very thorough outline and discussion about the use of even smaller charts see the €5

€0
Apr 2005 Oct 2005 Apr 2006 Oct 2006 Apr 2007 Oct 2007 Apr 2008 Oct 2008

Sparklines: theory and practice topic on www.edwardtufte.com


9
If you look at the charts on the financial pages of any non-tabloid you
will find plenty of charts which are much smaller than ours and Here a trend of data on sunspot numbers is
illustrated. The point is that the decline from the
generally include more data (hence much higher data density). We may peak is generally more gradual than the
not aspire to repeat much of their work on statistics, but these are good increase. This is only apparent in the second
examples of charts integrated with text and with a high data density. chart which takes up around 1/4 of the area

Dimensions
The default aspect should be landscape. It makes writing horizontal
text more straightforward and the human eye is well used to perceiving
small deviations from a line moving left to right (vertical variations from
the horizon). Much of the time the default aspect ratio given by Excel to
chart objects (1.8:1) is fine. The actual aspect of the plot area will be
somewhat different depending on title, axis labels etc. Some of the time
a slight adjustment to the height/breadth is made to allow for a slightly
longer title or to ensure all category labels fit in. 1701 1751 1801 1851 1901 1951 2001

There are some other times where you might want to change the
aspect ratio for more technical reasons.
-Variations in the gradient of very spiky trends are very hard to
1701 1751 1801 1851 1901 1951 2001
judge. A change in the aspect ratio to increase the SIDC-team, World Data Center for the Sunspot Index, Royal Observatory of Belgium, Monthly
breadth:height ratio helps as people can better judge variations Report on the International Sunspot Number, online catalogue of the sunspot index:
in slopes where they average 45o (see opposite). 3 http://www.sidc.be/sunspot-data/

Proportion of pupils in large classes (>30), England


-More than one chart with different date or value ranges. Where 1978-
45%
similar data is present in a similar format on the same page the 40%
user will make comparisons whether that is your intention or 35%

not. You should try as much as possible to align the value/dates 30%
Primary schools
to ensure that a set distance means the same value or date 25%

range in each chart. If all dimensions are the same yet data The second chart is
20%

ranges are different the reader may make incorrect inferences. extended to make the dates 15%
Secondary
schools
In an ideal world the alignment should be exact (do this where broadly align. This helps the 10%

reader identify common 5%

the point is to make direct comparisons), but an approximate patterns or diverging trends. 0%

method is an improvement for indirect comparisons (opposite) 1978 1982 1986 1990 1994 1998 2002 2006

Average size of UK maintained schools 1950-


-Scatter plots. Square dimensions make sense where the angle 900

of any line of best fit is important. 800

700

600
Secondary
500

400

300

Primary
200
3
This chart updates that first produced by William S. Cleveland in Visualizing Data (1993) as 100
o
an illustration of the method of choosing aspect ratios where slopes bank to 45 0

10 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005
-Expanded or composite charts. Where more than one chart is Breakdown of religion by classified NS-SEC,
included in a display then the focus should be on the aspect England & Wales 2001 (% of each religious group)
ratio of the individual parts, the final composite ratio will 20%
frequently be portrait (see opposite). Class 8
10%
-Bar charts with lots of (sub) categories. There is no sense in
0%
keeping a landscape layout if it makes the chart unreadable
(see below). 20%

10% 6 and 7

Change in Democrat % share of the vote 2004-08 0%

Men +5 20%
Women +5
10% 3, 4
and 5
18-29 +12 0%
30-44 +6
40%
45-59 +1

60+ +1 30%

20%
First time voters +16

10%
1 and 2
White +2
0%
Black +7
Jewish Other No Buddhist Hindu Christian Sikh Muslim
Hispanic +14 Religion

Protestant +5

Catholic +7

Jewish +4

Income < $50,000 +5

Income >$50,000 +6

Non-graduates +6

Graduate +4

+0 +2 +4 +6 +8 +10 +12 +14 +16 +18


Percentage points

11
Chart types
Again there are no hard and fast rules that cover every instance. Some
types of charts lend themselves towards certain types of data. The
principles of economy, self-sufficiency and clarity should be followed
when deciding between different types of chart. The first two should be
self evident in most cases but clarity needs some more thought.
Examples of categories 1, 2 and 3
Clarity 1. Identification/comparison of the value of any different
Research has shown that people’s ability to accurately perceive values, category within any of the charts on the left.
or differences in values, varies among the different ways that charts 2. Comparison of the value of one category in one of the charts
display values. They are, in order of effectiveness: 4 on the left with any other in a different chart on the left
3. Identification/comparison of the second or third stacked
series in the chart on the right
1. Position along a common scale
2. Position along a non-aligned scale 8
3. Length
6
4. Angle/slope
5. Area 4
14
6. Volume 2
12
7. Colour
0
10
8
Much of this should come as little surprise. It is easier to understand 8
values expressed in one dimension than in two. Likewise it is easier to 6
6
understand two-dimensions than three. Using colour as the sole 4
indicator of value is poor (there is little choice for maps). Some of the 4
2
other rankings are slightly more nuanced: it is easier to judge the value 2
from a point on a scale (1) than compare points on two identical scales 0
4 0
on different charts (2). It is easier to judge values from a common A B C D E F
baseline (1 or 2) than to compare different elements from a stacked 2
chart (3) where only the first series has a common baseline (opposite). 0
A B C D E F

4
William S. Cleveland; Robert McGill, Graphical Perception: The Visual Decoding of
Quantitative Information on Graphical Displays of Data, Journal of the Royal Statistical
Society. Series A (General), Vol. 150, No. 3. (1987), pp. 192-229.
12
While the conclusions draw and resulting charts suggested by the
authors of this research are not universally accepted, there are some
8
elements from their findings which can be used when deciding between
In a bar chart you are primarily 6
‘standard’ charts. The first step is to match the ranked list of elements judging position along a scale
(as on the previous page) to chart types. Some use more than one (effectively the position of the dot 4

element, but their primary one is: imposed here). Length and area 2
have secondary roles.
0
1. Line chart, simple bar/column chart, dot plot, scatter plot
A B C D E F
2. Comparison of more than one of the above with a common
scale 14
3. Stacked column/bar, area chart (1-3 opposite)
4. Pie chart 12
In a stacked bar for all but the
5. Pie chart, histogram, pictogram, bubble charts first series you are judging the 10
6. Pictogram, 3-D chart (falsely) length (dotted line) when
making comparisons. 8
7. None apart from maps
6
What does this tell us about broad chart types? First, simple charts
4
are better. Second, pie charts are less clear than bar/column
charts. Third, stacked charts (3) are less clear than a sequence of 2
simple line/bar/column charts (see below opposite). Stacked charts 0
become less clear with more stacked series. If only two series are A B C D E F
included then the difference in clarity between this and other options is
much smaller.
Breakdown of religion by classified NS-SEC,
Clarity is not our only consideration; those of accuracy, economy and England & Wales 2001 (% of each religious group)

self-sufficiency still apply. Here self-sufficiency can be defined both as 20%


a lack of need for labels, values etc. to be printed on data series and A ‘standard’ stacked bar,
Class 8
expanded and re-ordered to 10%
familiarity with the type of chart (it needs no explanation). Pie-charts help make the underlying 0%
are very familiar to everyone. However, their lack of clarity is illustrated patterns clearer.
by the fact that nearly all have values and/or percentages listed next to Breakdown of religion by classified NS-SEC,
20%

each slice. People find it hard to accurately judge the absolute and England & Wales 2001 (% of each religious group)
10% 6 and 7

relative slice of each slice, so anything but the simplest pie needs 90% 1 and 2 3,4 and 5 0%

labels and where labels are not needed the two or three pieces of data 80%
6 and 7 class 8
20%
they contain are probably better in the text. Therefore while very 70%
3, 4
common, the pie chart falls down on the principles of self- 60%
10%
and 5

sufficiency and economy. It has other limitations (more on this in 50%


0%

the next section) and therefore should only be used in exceptional 40%
40%
circumstances. 30%
30%

20%
Chart types –pluses and minuses 20%

10%
This table below summarises the main advantages and disadvantages 10% 1 and 2

0%
of the most common chart types. 0%
Christian Buddhist Hindu Jewish Muslim Sikh Other No Jewish Other No Buddhist Hindu Christian Sikh Muslim
religion Religion
13
Positive Negative
Simple, clear, recognisable, Trends less clear for very long
works for categories and time series, small space for
Column time series long category names, inflexible

Simple, clear, recognisable,


works for categories
including those with long Not appropriate for time series,
names, good for very large less recognisable than column
Bar number of categories chart

Data markers can be clunky,


Simple, clear, recognisable, not appropriate for category
works for time series and charts, interpolation of gaps,
Line index charts stacked charts not clear

Recognisable, works for Not category charts, less


time series and stacked flexible and much more data ink
Area charts than a line chart Typical retail price of premium unleaded petrol and
diesel,
Difficulty in perceiving values 120
pence per litre
especially with more than a few
Petrol 100
slices, needs labels, inflexible,
cannot be combined with other Diesel
types, no time series charts, 80
Pie Simple, recognisable never looks right in Excel
60
The only choice for
comparing two variables, 40
correct interpretation of date
Scatter values Limited other uses 20

Which is
0
clearer?
This gives us a basic starting point only. The next few sections outline Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08

some more areas in greater detail.


Typical retail price of premium unleaded petrol and
diesel,
Time series 120
pence per litre
Time series are normally illustrated with a column or line chart (area
Petrol 100
chart for stacked lines). The main differences are the greater amount of Diesel
data ink used for a column chart and the interpolation of a line chart. A 80
line chart is really a series of dot markers linked by lines. The line itself
gives a sense of flow and continuity and emphasises the shape of the 60

trend. Lines are much better when more than one series is being 40
illustrated, especially when there are large numbers of time periods
(see alternatives opposite). They are the only option for index charts. 20

0
There may be times where the underlying data lacks continuity and Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08
simply joining the dots gives the wrong impression. A column trend
chart may be preferable in such cases and where the chart looks at
14
positive and negative variations with no long term trend. However these The column chart seems to
are the exception and line charts should be the default option for justify itself more, but the
underlying data are the same
time series charts.

An area chart should be preferable to a stacked bar as it conveys the


sense of continuity. An expanded or composite chart is clearer than a
stacked area chart and is mentioned in more detail below. When there
are only a few time periods to illustrate a line can look lost and
meaningless while a column chart looks better (opposite). This is due to
the greater amount of data ink in the line chart. Where so few pieces of
data are included a chart may not be the best option.

Dual axis and combined charts


These are charts with two or more related series where one set of
Here growth in spending Here the growth in spending
values has a different unit of measurement or is much smaller than the seems to far outstrip that appears to barely close the
other. In such cases it needs to be plotted on a secondary value axis if in passenger numbers. gap with passenger numbers
it is to be included in the same chart. To help emphasise this, the chart National rail, public spending and passenger National rail, public spending and passenger
type of one series is sometimes changed (say from a line to a column) numbers
10
numbers
1.2
hence a combined chart–see opposite. A dual axis chart that has 6
Spending (£ billion) left hand scale 2.0 9 Spending (£ billion) left hand scale
combined chart types may be slightly harder to misunderstand, 5 Passengers (billion) right hand scale 8
Passengers (billion) right hand scale
1.0
7
but it is still potentially misleading. 4
1.5
0.8
6
5 0.6
3 1.0
With a single value axis and more than one series, crossing points and 4
0.4
gaps are an essential part of the information in a chart –e.g. voting 2
0.5
3
2
intentions. Crossing points and gaps on a dual axis chart do not have 1
1
0.2

the same fundamental meaning as on a chart with a single axis and are 0 0.0 0 0.0
a function of the scales. The examples opposite plot the same data with 1986-87 1991-92 1996-97 2001-02 2006-07 1986-87 1991-92 1996-97 2001-02 2006-07

different scales and could potentially be interpreted in different ways.


This effect would be more severe were category axes to be cut. Adding
National rail passengers (billion)
connected series to a chart will encourage the reader to think about 1.2
1.0
cause and effect. This may well be the idea, but dual axis charts can 0.8
give even a knowledgeable reader the wrong impression. These 0.6
charts should therefore not be used. 0.4

The alternative uses up a 0.2

There are alternatives. An index chart of all the series will at least not similar amount of space 0.0
but is less likely to be 1986-87 1991-92 1996-97 2001-02 2006-07
be affected of choice of scale and is a very good illustration of relative misunderstood National rail public spending (£ billion)
changes. But choice of index year does affect gaps and crossing points 6

of series and the data has been stripped of its absolute values. Two or 5
4
more separate charts placed next to each other will avoid these 3
problems. These can be reduced in size to keep the same data density 2
1
0
15 1986-87 1991-92 1996-97 2001-02 2006-07
as the original dual axis chart and are more flexible as they can
accommodate different date ranges more easily (see opposite).

Composite or expanded charts A rather muddled column chart with 8


These have been mentioned in the sections above on aspect ratio and two categories is changed to an
6
clarity and dual-axis charts. Fundamentally they are a multi-chart expanded chart which gives a much Country A
better idea of the pattern by 4
display with the intention that the reader makes direct comparison category within each country and 2
between the (related) data. Each individual chart has two elements still allows comparison between
0
(value and either category, time or another value) for an individual countries A B C D E F
category. Different charts have the same two elements for different 8
Country A Country B
categories, which on a single chart could either be stacked or next to 8 6 Country B
Country C
each other. 4
6
2

Because each single chart has a single series there are no 4


0
A B C D E F
problems with clarity of stacked series and no jumble of multiple 4
Country C
crossing lines or confusing side-by-side columns (below and 2
2
opposite and the religion chart on p.9). Normally the individual charts 0
0
will stack ‘upwards’ as the shared time or categories are along the x- A B C D E F
A B C D E F

axis. They could stack upwards and left to right if two additional
categories are added.

120
110
Coal and smokeless fuel
100
90
80
70

120 Electricity
Separating out the individual 110

categories in this example helps the 100


90
reader identify trends that are simply 80
70
not apparent in the original
150
140 Gas
130
Index prices of selected fuel components of the RPI 120
230
indices relative to the all items RPI, January 1987=100 110
100
210 90
Coal and smokeless fuels 80
190 Gas 70
Electricity
170 Heating oils 225 Heating oils

150 200

130 175

110 150

90 125

70 100

50 75
1987 1992 1997 2002 2007
Source: ONS series DOBW, DOBY, DOBX, DOBZ and CHAW

1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007
16
These composite charts all share two elements but are varied by a third
and sometimes fourth. Some of their advantages also apply to multiple
chart displays that do not all share two elements, but have different
value axes. The example at the bottom of page 15 showed an Proportion of pupils in large classes (>30), England
improvement on a dual-axis chart. Here the charts could not have same 45%
1978-

value axis, similar vertical movements in a trend meant different things. 40%

But their underlying data were broadly connected and they shared 35%

dates. The result is not a true composite chart, but a related multiple 30%
Primary schools
25%
chart display that follows the principles outlined at the start and is 20%
aimed at helping the reader better understand the evidence. The 15%
Secondary
example from page 8 (repeated opposite) only shares some of the 10% schools

dates, but when placed on the same page the varying date ranges 5%

would normally mean non-aligned dates for the related data. Here they 0%
1978 1982 1986 1990 1994 1998 2002 2006

are broadly aligned (the second one is lengthened) and some related
Average size of UK maintained schools 1950-
trends are clear. The alternative would be to lose 20 years of data from 900

the second chart which would mean ignoring evidence and reduced 800

data density. 700

600
Secondary
500
Producing true composite charts in Excel is not straightforward, but it 400
becomes easier with practise. An alternative is the ‘broadly aligned’ 300

approach where chart sizes are changed manually to align the scaling 200
Primary

of value and category axes. 100

0
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005
Expanded or composite charts should be considered as alternatives to
stacked or side-by-side charts for both category and time series data.
There will still be occasions where stacked or side-by-side charts are
appropriate (general with small numbers of series). The point of
highlighting an alternative approach is to illustrate their weaknesses
and to make people think about the message they are trying to portray
with a chart and the best way to achieve this.

17
Further online reading

Perceptual Edge
Juice Analytics
OECD factblog on data visualisation
Department of Mathematics and Statistics at York University (US)
Gallery of Data Visualization -The Best and Worst of Statistical
Graphics
Process trends Data Analysis and Visualization with Excel, R and
Google Tools
Presenting data by the Local Government Data Unit –Wales

Paul Bolton
January 2009

18

You might also like