Professional Documents
Culture Documents
Visualization
Dudoit
Motivation
Data
Visualization 1 Motivation
Dudoit
2 Principles of Data Visualization
Motivation
2.1 Do We Really Need a Graph?
Principles of
Data 2.2 General Considerations
Visualization
Do We Really 2.3 Graphical Perception
Need a Graph?
General
Considerations
2.4 Bad Graphs
Graphical
Perception
Bad Graphs 3 Survey of Data Visualization Techniques
Survey of 3.1 One Quantitative Variable
Data
Visualization 3.2 Multiple Quantitative Variables
Techniques
One 3.3 One Qualitative Variable
Quantitative
Variable
Multiple
3.4 Multiple Qualitative Variables
Quantitative
Variables 3.5 Conditional Plots
One Qualitative
Variable
Multiple
Qualitative
Variables Version: 05/02/2019, 17:17
Conditional 2 / 120
Plots
Data Visualization
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
“One picture worth ten thousand words.”
General
Considerations
Graphical
Perception Frederick R. Barnard, Printer’s Ink, March 10th, 1927.
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 3 / 120
Plots
An Oldie But Goodie
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Figure 2: Bitcoin wealth distribution.
Variables
One Qualitative
http://viz.wtf/image/166329900475.
Variable
Multiple
Qualitative
Variables
Conditional 5 / 120
Plots
Data Visualization
Data
Visualization
Dudoit
Motivation
One picture worth ten thousand words.
Principles of
Data
Visualization
• Only if it is a good picture.
Do We Really
Need a Graph? • We tend to be more demanding with text than with
General
Considerations
Graphical
graphics.
Perception
Bad Graphs • How long does it take to write/read one thousand words?
Survey of
Data At least the same effort should be put into
Visualization
Techniques
making/viewing a graph.
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 6 / 120
Plots
Learning Objectives
Data
Visualization
Dudoit
• Become a wise and effective “creator”/“maker” as well as
“reader”/“viewer” of data visualization.
Motivation
Principles of
• Master general principles for data visualization and apply
Data
Visualization
these when making your own graphs as well as when
Do We Really
Need a Graph?
viewing others’.
General
Considerations • Produce the right graph for the matter at hand.
Graphical
Perception
Bad Graphs • Become aware of the variety of graphical techniques
Survey of
Data
available for different types of data and purposes and
Visualization understand their pros and cons.
Techniques
One
Quantitative
Go beyond histograms and pie charts!
Variable
Multiple • Think more carefully about each plot you create, consider
Quantitative
Variables
One Qualitative
the pros and cons of different choices, and try several
Variable
Multiple different plots for a given dataset.
Qualitative
Variables
Conditional 7 / 120
Plots
Learning Objectives
Data
Visualization
Data
Visualization
Dudoit
• Data visualization is a fundamental aspect of Data
Science.
Motivation
Principles of
• It is essential to “look at data” throughout the workflow,
Data
Visualization
from exploratory data analysis (EDA) to model diagnostics
Do We Really
Need a Graph?
and reporting the results of the inquiry.
General
Considerations • Visualization is valuable for detecting the main features
Graphical
Perception
Bad Graphs
(good or bad) of a dataset, revealing patterns, and
Survey of suggesting theories or further questions.
Data
Visualization • Visualization is also useful for quality/assessment control
Techniques
One
Quantitative
(QA/QC) and detecting problems with the data.
Variable
Multiple • An effective plot can be good enough to answer the
Quantitative
Variables
One Qualitative
question on its own. In some cases, it may even be the
Variable
Multiple only appropriate type of answer.
Qualitative
Variables
Conditional 9 / 120
Plots
Data Visualization
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization • An effective plot can also be sufficient to convince
Do We Really
Need a Graph?
General
stakeholders of the findings from a full-blown statistical
Considerations
Graphical inference procedure.
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 10 / 120
Plots
Data Visualization
Data
Visualization
• Although data visualization is ubiquitous and heavily relied
Dudoit
upon, in research as well as in the media, typically not
Motivation much thought is put into creating or reading plots.
Principles of I Creators often rely on very limited subsets of plots and
Data
Visualization without proper consideration of their limitations.
Do We Really
Need a Graph?
I Readers often passively absorb a message imposed on them
General
Considerations by the graph, rather than reason and think critically about
Graphical
Perception it.
Bad Graphs
Survey of
• Very few Statistics, Computer Science (CS), or domain
Data
Visualization curricula offer courses in data visualization.
Techniques
One • Proper data visualization is non trivial. Entire courses
Quantitative
Variable
Multiple
could and should be devoted to data visualization,
Quantitative
Variables including discussions of vision and perception to guide the
One Qualitative
Variable
Multiple
design of effective graphs.
Qualitative
Variables
Conditional 11 / 120
Plots
Do We Really Need a Graph?
Data
Visualization
Dudoit
Motivation
Data
Visualization
Dudoit
Trump Trump
Motivation
48.9% 46.1%
Principles of
Data
Visualization
5.7%
Do We Really Others
Need a Graph?
General 51.1% 48.2%
Considerations
Graphical
Perception Clinton Clinton
Bad Graphs
Survey of
Data
Visualization
Techniques
Figure 3: US Election Results 2016. Left: Pie chart of percentage of
One
Quantitative
popular vote for Trump and Clinton. Right: Pie chart of percentage
Variable
Multiple
of popular vote for Trump, Clinton, and other candidates. Why the
Quantitative
Variables different percentages on left and right?
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 13 / 120
Plots
From Tables to Graphs
Data
Visualization
Dudoit
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Figure 4: Turning tables into graphs (Gelman et al., 2002, Figure 2).
Quantitative
Variable Counts and rates of citations of various professions from the New
Multiple
Quantitative York Times database. Graph: Log-log scale allows comparison across
several orders of magnitude. Any 45◦ line indicates constant relative
Variables
One Qualitative
Variable
Multiple frequency. The relative positions of the different professions is clearer.
Qualitative
Variables
Conditional 15 / 120
Plots
More Oldies But Goodies
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 5: Album de Statistique Graphique (1881).
Variable
Multiple
https://www.davidrumsey.com/.
Qualitative
Variables
Conditional 16 / 120
Plots
More Oldies But Goodies: Maps
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Figure 6: Album de Statistique Graphique (1881). Train load (scaled
Multiple
Quantitative
by length of line) is represented by thickness of bands. How would
Variables
One Qualitative you represent this data without a graph?
Variable
Multiple
Qualitative
Variables
Conditional 17 / 120
Plots
More Oldies But Goodies: Graphical Timetables
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One Figure 7: Marey (1885). Train schedule Paris–Lyon, 1880s.
Quantitative
Variable https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?
Multiple
Quantitative
Variables
msg_id=0003zP. How would you represent this data without a
One Qualitative
Variable
graph?
Multiple
Qualitative
Variables
Conditional 18 / 120
Plots
More Oldies But Goodies: Graphical Timetables
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One Figure 8: Marey (1885). Train schedule Paris–Lyon with TGV, 1980s
Quantitative
Variable vs. 1880s. The red line indicates the 1981 itinerary of the TGV, a
Multiple
Quantitative
Variables
new express train that cut the trip from Paris to Lyon to under three
One Qualitative
Variable
hours (vs. nine hours in the 1880s).
Multiple
Qualitative
Variables
Conditional 19 / 120
Plots
More Oldies But Goodies: Graphical Timetables
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Figure 9: Train schedule SF–Gilroy, now.
Variable
Multiple
https://i.stack.imgur.com/qJ1hH.
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 20 / 120
Plots
More Oldies But Goodies: Graphical Timetables
Data
Visualization
• In Marey (1885)’s Paris–Lyon graphical train schedule in
Dudoit
the 1880s, time is represented on the x axis and the
Motivation stations and distances between stations are represented on
Principles of
Data
the y axis (Tufte, 2001).
Visualization
Do We Really
• A train’s itinerary is represented by a line.
Need a Graph?
General
Considerations
• The slope of the line reflects the speed of the train: The
Graphical
Perception more nearly vertical the line, the faster the train.
Bad Graphs
Data
Visualization
Dudoit
• Graphs should attempt to summarize data in a simple,
intuitive, and efficient manner, without distorting or
Motivation
loosing important information.
Principles of
Data
Visualization
• However, not all good graphs are simple. As with text,
Do We Really
Need a Graph?
plots conveying a lot of information (e.g., displaying
General
Considerations multiple variables) require both a skillful creator and an
Graphical
Perception educated reader.
Bad Graphs
Survey of
E.g. Minard’s graph for Napoleon’s Russia campaign, old
Data
Visualization
graphical train schedules.
Techniques
One
• There is no “one-size-fits-all” graph, i.e., different types of
Quantitative
Variable graphs should be used for different
Multiple
Quantitative I types of data, e.g., quantitative, qualitative variables;
Variables
One Qualitative
Variable
I purposes, e.g., debugging code, EDA, reporting results;
Multiple
Qualitative
I media, e.g., print journal, projector.
Variables
Conditional 22 / 120
Plots
Caveats
Data
Visualization
Principles of
E.g. Histograms map n data points into B < n bins;
Data boxplots map n data points into 5 summary statistics (+
Visualization
Do We Really
Need a Graph?
possibly outliers).
General
Considerations • By focusing on certain aspects of the data or even
Graphical
Perception
Bad Graphs
imposing structure on data, graphs can also be subjective.
Survey of E.g. Choosing which variables to plot, decisions regarding
Data
Visualization axes and scales, dendrogram representation of clusters1 .
Techniques
One • As with text, the creator of the plot makes editorial
Quantitative
Variable
Multiple
decisions as to which data to display and which aspects of
Quantitative
Variables these data to show or emphasize.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 23 / 120
Plots
Caveats
Data
Visualization
• The reader should assess the relevance and reliability of
Dudoit the data being displayed, as well as the appropriateness of
the graph.
Motivation
Data
Visualization
Data
Visualization In the process of creating a plot, you should consider the
Dudoit
following issues.
Motivation • Determine the purpose of the plot.
Principles of
Data
E.g. EDA, debugging code, comparing distributions,
Visualization
Do We Really
model diagnostics, summarizing results, reporting results.
Need a Graph?
General • Formulate the message.
Considerations
Graphical
Perception • Identify the audience.
Bad Graphs
Data
Visualization
Data
Visualization • Choose color palette carefully. E.g. Be mindful of color
Dudoit blindness, use different color schemes for different types of
Motivation
data and messages (e.g., sequential, qualitative, and
Principles of
diverging).
Data
Visualization • Provide sufficient information so that the plot can be
Do We Really
Need a Graph? interpreted properly.
General
Considerations
Graphical
E.g. Title, axis parameters (i.e., label, tick marks),
Perception
Bad Graphs
annotation, legend, caption, etc.
Survey of In a document, number the figures and tables.
Data
Visualization • Do not include irrelevant information, i.e., avoid “chart
Techniques
One
Quantitative
junk”.
Variable
Multiple
Quantitative
• Principle of “least surprise”: If you defy expectations,
Variables
One Qualitative
people may get confused. Only defy expectations if it is
Variable
Multiple very important.
Qualitative
Variables
Conditional 28 / 120
Plots
General Considerations
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
• Experiment, i.e., consider different types of plots and
Do We Really
Need a Graph?
update the plots iteratively.
General
Considerations
Graphical
• Of course, always think about the quality of the data you
Perception
Bad Graphs plot.
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 29 / 120
Plots
General Considerations
Data
Visualization
Principles of information?
Data I For larger samples sizes, plot relevant summaries of the
Visualization
Do We Really data, that do not distort or loose important information in
Need a Graph?
General the data.
Considerations
Graphical
Perception • Variables to display/emphasize. Depends on the purpose
Bad Graphs
Survey of
and message of the plot.
Data
Visualization • Type of variables. Quantitative and qualitative variables
Techniques
One
call for different types of graphical summaries.
Quantitative
Variable
Multiple
• Pre-processing. E.g. Transformation (e.g., log),
Quantitative
Variables dimensionality reduction, imputation.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 30 / 120
Plots
Graphical Perception
Data
Visualization
Data
Visualization
Dudoit
Motivation
Principles of • There are empirical laws for perception that can be used
Data
Visualization to rank different types of graphical encodings.
Do We Really
Need a Graph?
General
• In general, such laws relate the perceived (change in)
Considerations
Graphical intensity in a physical stimulus to the actual (change in)
Perception
Bad Graphs intensity. This concerns stimuli to all senses, i.e., vision,
Survey of
Data
hearing, taste, touch, and smell.
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 32 / 120
Plots
Graphical Perception: Weber’s Law
Data
Visualization • Weber’s Law is an empirical relationship in psychophysics
Dudoit between the initial intensity in a stimulus (I ) and the
Motivation
smallest perceivable difference (a.k.a., just noticeable
Principles of difference) in the stimulus intensity (∆I ):
Data
Visualization ∆I
Do We Really = k, (1)
Need a Graph?
General
I
Considerations
Graphical
Perception
where k is a proportionality constant for a given type of
Bad Graphs stimulus 2 .
Survey of
Data • In terms of length, this means we detect a 1 cm change in
Visualization
Techniques a 1 m length as easily as we detect a 10 m change in a 1
One
Quantitative km length.
Variable
Multiple
Quantitative
• Weber’s Law appears to hold for many different graphical
Variables
One Qualitative encodings.
Variable 2
Multiple Law formulated and published by Gustav Theodor Fechner
Qualitative
Variables (1801–1887), a student of Ernst Heinrich Weber (1795–1878).
Conditional 33 / 120
Plots
Graphical Perception: Stevens’ Law
Data
Visualization • Stevens (1957) Law is an empirical relationship in
Dudoit psychophysics between the intensity in a stimulus and the
Motivation
perceived magnitude of the sensation created by the
Principles of
stimulus:
Data
Visualization
ψ(I ) = Ci β , (2)
Do We Really
Need a Graph? where I is the intensity or strength of the stimulus in
General
Considerations
Graphical
physical units (energy, weight, pressure, mixture
Perception
Bad Graphs proportions, etc.), ψ(I ) is the magnitude of the sensation,
Survey of β is an exponent that depends on the type of stimulation
Data
Visualization or sensory modality, and C is a proportionality constant
Techniques
One that depends on the units used.
Quantitative
Variable • Examples of values for exponent, β
Multiple
Quantitative
Variables Length: 0.9 – 1.1
One Qualitative
Variable Area: 0.6 – 0.9
Multiple
Qualitative
Variables
Volume: 0.5 – 0.8
Conditional 34 / 120
Plots
Graphical Perception: Stevens’ Law
Data
Visualization • For lengths, the relationship is almost linear, thus our
Dudoit
perception is about right.
Motivation • However, according to this power law, our perception of
Principles of
Data
areas and volumes is conservative, i.e., when values are
Visualization
Do We Really
represented as areas or volumes, we underestimate the
Need a Graph?
General
large values relative to the small ones and overestimate
Considerations
Graphical the small ones relative to the large ones.
Perception
Bad Graphs
• E.g. Areas, with β = 0.7.
Survey of
Data Consider two areas of size 1 and 2, respectively.
Visualization
Techniques
One ψ(2) 20.7
Quantitative
Variable = 0.7 u 1.62.
Multiple ψ(1) 1
Quantitative
Variables
One Qualitative
Variable Thus, we don’t see the bigger area as twice as large.
Multiple
Qualitative
Variables
Conditional 35 / 120
Plots
Graphical Perception: Stevens’ Law
Data
Visualization
Dudoit
Motivation
Data
Visualization electric shock
5
3.5 saturation
Dudoit 1.7
length
4
Motivation 1
Perceived Intensity
Principles of area
0.7
3
Data
Visualization depth
Do We Really 0.67
Need a Graph?
2
brightness
General
Considerations 0.5
Graphical
Perception
1
Bad Graphs
Survey of
Data
0
Visualization
Techniques 0 1 2 3 4 5
One
Quantitative Actual Intensity
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 10: Graphical perception: Steven’s Law. Stevens (1957)
Variable
Multiple
perceived sensory magnitude power law.
Qualitative
Variables
Conditional 37 / 120
Plots
Graphical Perception: Combining Weber’s and
Stevens’ Laws
Data
Visualization
Data
Visualization
• Cleveland and McGill (1985) carried out an extensive study
Dudoit
of graphical encodings to obtain a best to worst ranking.
Motivation
• The encodings they examined include: position on a
Principles of
Data common aligned scale, position on a common unaligned
Visualization
Do We Really
scale, length, slope, angle, area, volume, color hue,
Need a Graph?
General brightness, and purity.
Considerations
Graphical
Perception
• One of their experiments consisted of
Bad Graphs
I 7 graphical encodings,
Survey of
Data I 3 judgments per encoding,
Visualization
Techniques
I 10 replications per subject,
One
Quantitative
I 127 experimental subjects.
Variable
Multiple
Quantitative
Assessment criterion: error = kperceived p − true pk,
Variables
One Qualitative
where p denotes the ratio (in percentages) of the smaller
Variable
Multiple to the larger magnitude.
Qualitative
Variables
Conditional 39 / 120
Plots
Graphical Perception
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable Figure 11: Graphical perception. Based on Table 1 in Cleveland and
Multiple
Quantitative
Variables
McGill (1985).
One Qualitative
Variable
http://paldhous.github.io/ucb/2016/dataviz/week2.html.
Multiple
Qualitative
Variables
Conditional 40 / 120
Plots
Bad Graphs
Data
Visualization
• The literature is full of “bad graphs”, that, for instance,
Dudoit
distort the data and are misleading, are too complicated,
Motivation or are missing essential information.
Principles of
Data • Karl Broman’s Top Ten Worst Graphs (including one of
Visualization
Do We Really
his own!): https://www.biostat.wisc.edu/~kbroman/
Need a Graph?
General topten_worstgraphs/.
Considerations
Graphical
Perception • Ross Ihaka’s Good and Bad Graphs: https://www.stat.
Bad Graphs
Survey of
auckland.ac.nz/~ihaka/120/Lectures/lecture03.pdf.
Data
Visualization • Edward Tufte: https://www.edwardtufte.com/bboard/
Techniques
One
q-and-a-fetch-msg?msg_id=00040Z.
Quantitative
Variable
Multiple
• Junk Charts:
Quantitative
Variables https://junkcharts.typepad.com/junk_charts/.
One Qualitative
Variable
Multiple • WTF Visualization: http://viz.wtf.
Qualitative
Variables
Conditional 41 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Figure 12: Top 10 Google salaries by job category: Pie chart.
Quantitative
Variable
https://junkcharts.typepad.com/junk_charts/2011/10/
Multiple
Quantitative the-massive-burden-of-pie-charts.html. What’s the
Variables
One Qualitative message? What do the angles represent? What’s a better graph?
Variable
Multiple
Qualitative
Variables
Conditional 42 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable Figure 13: Top 10 Google salaries by job category: Interval chart.
Multiple
Quantitative
Variables
https://junkcharts.typepad.com/junk_charts/2011/10/
One Qualitative
Variable
the-massive-burden-of-pie-charts.html.
Multiple
Qualitative
Variables
Conditional 43 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization Lead Software Engineer Contractor ●
Directors ●
Motivation
Human Resources Director ●
Principles of
Data Engineering Director ●
Visualization
Do We Really Senior Partner Technology Manager ●
Need a Graph?
General Staff User Experience Designer ●
Considerations
Graphical
Marketing Director ●
Perception
Bad Graphs
Senior Managers* ●
Survey of
Data Group Product Manager ●
Visualization
140 160 180 200 220 240
Techniques
One Salary, thousands of dollars
Quantitative
Variable
Multiple
Quantitative
Variables
Figure 14: Top 10 Google salaries by job category: Interval chart.
One Qualitative
Variable
Sorted by midpoint of salary range.
Multiple
Qualitative
Variables
Conditional 44 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization Directors ●
Survey of
Data Marketing Director ●
Visualization
140 160 180 200 220 240
Techniques
One Salary, thousands of dollars
Quantitative
Variable
Multiple
Quantitative
Variables
Figure 15: Top 10 Google salaries by job category: Interval chart.
One Qualitative
Variable
Sorted by salary range.
Multiple
Qualitative
Variables
Conditional 45 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Figure 16: Google Home query categories: Pie chart.
Variable
Multiple
http://viz.wtf/image/171134950336. Unreadable. Can’t match
Quantitative
Variables numbers to categories. What’s a better graph?
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 46 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable Figure 17: Bitcoin wealth distribution: Pie chart.
Multiple
Quantitative http://viz.wtf/image/166329900475. What’s the message?
Variables
One Qualitative How to compare shapes and areas? Without text, pie uninformative.
Variable
Multiple What’s a better graph?
Qualitative
Variables
Conditional 47 / 120
Plots
Bad Graphs: Pie Charts
Data
Visualization
Dudoit
100
100
Motivation ●
● ● ● ● ●
Principles of ●
●
80
Data
80
Visualization
Do We Really ●
60
●
60
Need a Graph?
% BTC owned
% BTC owned
General
Considerations
Graphical
40
●
40
●
Perception
Bad Graphs
20
20
Survey of ●
Data ●
Visualization ● ●
●
● ● ●
0
0
Techniques
0 20 40 60 80 100 0 20 40 60 80 100
One
Quantitative % of top addresses % of bottom addresses
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 18: Bitcoin wealth distribution: Scatterplot.
Variable
Multiple
Qualitative
Variables
Conditional 48 / 120
Plots
Bad Graphs: Multilevel Donut Charts
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable Figure 19: Goldman Sachs job listings: Multilevel donut chart.
Multiple
Quantitative
Variables
https://s3.amazonaws.com/cbi-research-portal-uploads/
One Qualitative
Variable
2017/09/18173935/GSteardownjobs. What’s the message?
Multiple
Qualitative
Unreadable. What’s a better graph?
Variables
Conditional 49 / 120
Plots
Bad Graphs: Wordclouds
Data
Visualization
Dudoit
american
spending one businesses
people
business
Motivation
know americans
many tonight it...s country
Principles of
economy future government health
Data that...s let...s
right
must ...ve the reform deficit get families
Visualization also they energyput even just work
Do We Really jobs that time care next first like small
tax give last help clean
Need a Graph?
make
General
security america i...m world
still want this
but every
come
Considerations education now new
can
Graphical
Perception take and year
nation
...re
Bad Graphs
Survey of
will
years
congress
two need
Data
Visualization
Techniques
One
Quantitative
Variable
Figure 20: State of the Union speeches 2010 and 2011: Wordcloud.
Multiple
Quantitative
Frequency of words with at least 15 occurrences. What’s the
Variables
One Qualitative message? How to compare frequencies of words? What’s a better
Variable
Multiple graph?
Qualitative
Variables
Conditional 50 / 120
Plots
Bad Graphs: Wordclouds
Data
Visualization spending
security
put
●
●
●
education ●
they ●
i...m ●
Dudoit ...re
still
must
●
●
●
first ●
even ●
...ve ●
this ●
right ●
Motivation reform
help
health
●
●
●
clean ●
want ●
let...s ●
Principles of it...s
families
●
●
deficit ●
Data world
that
next
●
●
●
Visualization congress
care
small
●
●
●
many ●
Do We Really future
every
●
●
give
Need a Graph? get
economy
●
●
●
tax ●
General like
come
●
●
Considerations that...s
energy
●
●
business ●
Graphical also
country
●
●
Perception businesses
nation
●
●
tonight ●
Survey of last
know
the
●
●
●
Data just
make
america
●
●
●
american ●
Visualization americans
year
●
●
now ●
Techniques work
one
●
●
years ●
jobs ●
One new
but
●
●
Quantitative can
people
●
●
Variable and
will
●
●
Quantitative Frequency
Variables
One Qualitative
Variable
Multiple Figure 21: State of the Union speeches 2010 and 2011: Dotplot.
Qualitative
Variables Frequency of words with at least 15 occurrences.
Conditional 51 / 120
Plots
Bad Graphs: Wordclouds
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
Figure 22: Gilets jaunes: Wordcloud. Frequency of expressions and
One
Quantitative
hashtags on Twitter for first four days of gilets jaunes movement.
Variable
Multiple
How to compare frequencies between days?
Quantitative
Variables https://www.lexpress.fr/actualite/societe/
One Qualitative
Variable gilets-jaunes-ce-qu-en-disent-les-francais_2055542.
Multiple
Qualitative html.
Variables
Conditional 52 / 120
Plots
Bad Graphs: Wordclouds
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 23: Names: Wordcloud.
Variable
Multiple https://www.wordclouds.com/?cloud=names.
Qualitative
Variables
Conditional 53 / 120
Plots
Bad Graphs: Wordclouds
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables Figure 24: Business words: Wordcloud.
One Qualitative
Variable
Multiple
https://www.wordclouds.com/?cloud=business
Qualitative
Variables
Conditional 54 / 120
Plots
Bad Graphs
Data
Visualization
• Chart junk. The previous graphs exemplify “chart junk” ,
Dudoit
i.e., they contain superfluous elements that are not
Motivation
necessary to convey the information contained in the data,
Principles of
Data but instead distract the viewer from this information or
Visualization
Do We Really
even mask or distort important information.
Need a Graph?
General • Pie charts.
Considerations
Graphical
Perception
I Frequency represented by angle/area.
Bad Graphs I Angles and areas are hard to perceive and compare.
Survey of
Data
I Pie charts quickly become unreadable for more than a
Visualization handful of values.
Techniques
One
I Listing the values is often better – they are actually often
Quantitative
Variable added to a pie chart anyway!
Multiple
Quantitative
Variables
I How to select order of categories?
One Qualitative
Variable
I Not amenable to comparing distributions; side-by-side
Multiple
Qualitative
comparisons not effective.
Variables
Conditional 55 / 120
Plots
Bad Graphs
Data
Visualization I Hard to extend to multiple variables.
Dudoit I A lot of junk often added to pie charts, e.g., thickness,
slice explosion.
Motivation
Principles of
• Wordclouds/tag clouds.
Data I Frequency represented by font size.
Visualization
Do We Really I Neither area nor height corresponds to frequency of words.
Need a Graph?
General I How do longer words compare with shorter words?
Considerations
Graphical I How are capital letters handled?
Perception
Bad Graphs I How to calculate relative difference in frequency between
Survey of two words?
Data
Visualization I How are the words ordered within the cloud (alphabetical,
Techniques
One
frequency)?
Quantitative
Variable I Not amenable to comparing distributions; side-by-side
Multiple
Quantitative comparisons not effective.
Variables
One Qualitative I How to extend to multiple variables?
Variable
Multiple I A lot of junk often added to word clouds.
Qualitative
Variables
Conditional 56 / 120
Plots
Bad Graphs
Data
Visualization
Data
Visualization Gapminder. (https://www.gapminder.org)
Dudoit
• We will use data from Gapminder to reason through the
Motivation process of data visualization, e.g., population, population
Principles of density, life expectancy, income for each country.
Data
Visualization • Note that in this case we have a census, i.e., there is no
Do We Really
Need a Graph?
General
sampling involved 3 .
Considerations
Graphical
Perception
• Gapminder is a Swedish foundation co-created in 2005 by
Bad Graphs
Hans Rosling (Professor of International Health at
Survey of
Data Karolinska Institute) and family members.
Visualization
Techniques • “Gapminder is a fact tank, not a think tank.”
One
Quantitative
Variable “Gapminder measures ignorance about the world.”
Multiple
Quantitative “Gapminder makes global data easy to use and
Variables
One Qualitative
Variable
understand.”
Multiple
Qualitative “Gapminder promotes Factfulness, a new way of thinking.”
Variables
Conditional 58 / 120
Plots
Gapminder
Data
Visualization
Dudoit
Motivation
Principles of
Data
• Gapminder developed Trendalyzer, a data visualization
Visualization
Do We Really
software providing dynamic and interactive graphics of
Need a Graph?
General data compiled by organizations such as the United Nations
Considerations
Graphical
Perception
and the World Bank (acquired by Google in 2007).
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable 3
Multiple Some of the data could be estimates, but we won’t concern ourselves
Qualitative
Variables with this at this point.
Conditional 59 / 120
Plots
Gapminder
INCOME LEVELS LEVEL 1 LEVEL 2 LEVEL 3 LEVEL 4
85
Data Japan Andorra
Spain Australia
Visualization Greece
Malta
Cyprus
Israel
N. Zeal.
France
Canada
UK Austria
Sweden
Ireland
Netherlands
Norway Luxembourg
Singapore
Sloven.
Costa Rica
Finl. Germany
Peru Turkey Portugal South Korea Belgium Denm. Kuwait
80
Qatar
Saudi Arabia
Maldives
Lebanon Chile
Dudoit Nicaragua
Bosnia & Herz.
Jordan
Albania
Colombia
Cuba Panama
Czech Rep.
Bermuda USA
China
Sri Lanka Monten.
Croatia Rico
Tunisia Uruguay Slovak Rep. Oman Brunei
Maced F. Algeria Argentina Hungary
Antig.& B.
Mexico
Serbia
Malaysia
Vietnam Jamaica Ecuador
Barbados Aruba
HEALTHY
El Salvador Latvia United Arab Em.
Dominican R. Venezuela
Lithuania
75
Armenia Bulgaria Romania
Palestine Morocco St. Lucia
Thailand
Motivation Moldova
Bolivia
Paraguay
Iran
Mauritius
Libya
Seychelles
Bahamas
Brazil
Georgia Dominica
Tajikistan Honduras Samoa Azerbaijan
North Korea Cape Verde Bhutan
Egypt
Timor-Leste
Uzbekistan Guatemala Belize
Trinidad & Tobago
Bangladesh
Suriname
Tonga Grenada
India
Turkmenistan
Cambodia
HEALTH Myanmar OF NATIONS
Data Comoros Gambia Sao T & P
Micronesia
Syria IN 2015
Life expectancy (years)
Mars. Isl.
Lao Iraq This graph compares
Sudan
Visualization Rwanda
Kenya Yemen Pakistan
Guyana
Fiji
Mongolia
Gabon Life Expectancy & GDP per capita
for all 182 nations
Ethiopia Senegal
recognized by the UN.
65
Vanuatu Ghana
Do We Really Haiti
Solomon Isl. Namibia
Madagascar
Nigeria
Djibouti
Liberia Tanzania
Need a Graph? Togo
Kiribati
Benin
COLOR BY REGION
Burundi
Niger Uganda Papua N. G. Congo, Rep.
Equatorial
Guinea
General Eritrea
Burkina Faso
South Africa
Malawi Congo Mali Cameroon
60
Graphical Mozambique
Chad
Survey of Swaziland
www.gapminder.org
50
INCOME
a free fact-based worldview
Visualization Lesotho
POOR RICH
Techniques $1 000 $2 000 $4 000 $8 000
GDP per capita ($ adjusted for price differences, PPP 2011)
$16 000 $32 000 $64 000 $128 000
version 15
DATA SOURCES—INCOME: World Bank’s GDP per capita, PPP (2011 international $). Income of Syria & Cuba are Gapminder estimates. X-axis uses log-scale to make a doubling income show same distance on all levels. POPULATION: Data from UN Population Division. LIFE EXPECTANCY: IHME GBD-2015, as of Oct 2016.
ANIMATING GRAPH: Go to www.gapminder.org/tools to see how this graph changed historically and compare 500 other indicators. LICENSE: Our charts are freely available under Creative Commons Attribution License. Please copy, share, modify, integrate and even sell them, as long as you mention: ”Based on a free chart from www.gapminder.org”.
One
Quantitative
Variable
Multiple
Quantitative
Figure 25: Gapminder: World Poster 2015. “How Does Income
Variables
One Qualitative
Relate to Life Expectancy? Short answer - Rich people live longer.”
Variable
Multiple
Bubble chart with four variables displayed in 2D.
Qualitative
Variables
Conditional 60 / 120
Plots
Software
Data
Visualization
• Most of the plots below are produced with Python’s
Dudoit
seaborn library, using default arguments.
Motivation
• Default settings typically do not correspond to the most
Principles of
Data basic version of the plot, but rather impose many decisions
Visualization
Do We Really
Need a Graph?
on the plot, e.g., color, legend, ordering. Experiment with
General
Considerations
different settings to make sure you get the plot you want.
Graphical
Perception • Seaborn tutorial:
Bad Graphs
Survey of https://seaborn.pydata.org/tutorial.html.
Data
Visualization Each function has many arguments to customize the plots.
Techniques
One
As usual, consult documentation.
Quantitative
Variable
Multiple
• Datasets available at:
Quantitative
Variables https://github.com/mwaskom/seaborn-data.
One Qualitative
Variable
Multiple
E.g. Titanic survival dataset, Fisher’s iris dataset.
Qualitative
Variables
Conditional 61 / 120
Plots
One Quantitative Variable
Data
Visualization
Dudoit
How would you visualize life expectancy in 2018 over all
Motivation countries?
Principles of
Data
Visualization count 182.000000
Do We Really
Need a Graph? mean 72.726374
General
Considerations
Graphical
std 7.237996
Perception
Bad Graphs min 51.100000
Survey of 25% 67.150000
Data
Visualization 50% 74.100000
Techniques
One 75% 78.075000
Quantitative
Variable
Multiple
max 84.200000
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 62 / 120
Plots
Stem-And-Leaf Plots
Data
Visualization Key: aggr|stem|leaf
182 84 0 = 84 .0
x1.0 = 84.0
182 84 02
Dudoit 180 83 25
178 82 1244446669
168 81 11223333458889
154 80 0125778
Motivation 147 79 13446
142 78 00122367
134 77 0223466677899
Principles of 121 76 0125678899
111 75 1223355578999
Data 98 74 011238899
_
Visualization 89 73 123448
83 72 0002334456
Do We Really 73 71 115569
Need a Graph? 67 70 3555679
60 69 138
General 57 68 0002378
Considerations 50 67 113389
44 66 114689
Graphical 38 65 0245788
Perception 31 64 356
28 63 14569
Bad Graphs 23 62 24599
18 61 01112269
Survey of 10 60 025
7 59 57
Data 5 58 067
2 57
Visualization 2 56
Techniques 2 55
2 54
One 2 53
Quantitative 2 52
2 51 16
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple Figure 26: Life expectancy, 2018.
Qualitative
Variables
Conditional 63 / 120
Plots
Stripplots
Data
Visualization
85 85
Dudoit
80 80
Motivation
Principles of 75 75
Data
Visualization
life expectancy
life expectancy
Do We Really 70 70
Need a Graph?
General
Considerations 65 65
Graphical
Perception
Bad Graphs 60 60
Survey of
Data 55 55
Visualization
Techniques 50 50
One
Quantitative
Variable
Multiple
Quantitative Figure 27: Life expectancy, 2018. Right: Jittering, i.e., adding
Variables
One Qualitative random noise, to avoid overplotting.
Variable
Multiple
Qualitative
Variables
Conditional 64 / 120
Plots
Histograms
Data
Visualization
Dudoit
35
Motivation
Principles of 30
Data
Visualization 25
Do We Really
Need a Graph? 20
General
Considerations
Graphical
15
Perception
Bad Graphs 10
Survey of
Data
5
Visualization
Techniques 0
One
50 55 60 65 70 75 80 85
Quantitative life expectancy
Variable
Multiple
Quantitative
Variables Figure 28: Life expectancy, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 65 / 120
Plots
Histograms
Data
Visualization
Dudoit
140
Motivation bins=default
120 2
Principles of 20
Data 100
Visualization
Do We Really
Need a Graph? 80
General
Considerations 60
Graphical
Perception
Bad Graphs 40
Survey of 20
Data
Visualization
Techniques 0
One
50 55 60 65 70 75 80 85
Quantitative life expectancy
Variable
Multiple
Quantitative
Variables Figure 29: Life expectancy, 2018. Different numbers of bins.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 66 / 120
Plots
Density Plots
Data
Visualization
Dudoit
Motivation 0.05
Principles of
Data 0.04
Visualization
Do We Really
Need a Graph? 0.03
General
Considerations
Graphical
Perception
0.02
Bad Graphs
Survey of 0.01
Data
Visualization
Techniques 0.00
One
50 60 70 80 90
Quantitative life expectancy
Variable
Multiple
Quantitative
Variables Figure 30: Life expectancy, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 67 / 120
Plots
Density Plots
Data
Visualization
Dudoit
Data
Visualization
Dudoit
85
Motivation
80
Principles of
Data
Visualization
75
life expectancy
Do We Really
Need a Graph? 70
General
Considerations 65
Graphical
Perception
Bad Graphs 60
Survey of 55
Data
Visualization 50
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables Figure 32: Life expectancy, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 69 / 120
Plots
One Quantitative Variable and One Qualitative
Variable
Data
Visualization
Dudoit
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 70 / 120
Plots
Stripplots
Data
Visualization
Dudoit 85
Motivation 80
Principles of
Data 75
Visualization
Do We Really
Need a Graph?
life expectancy 70
General
Considerations
Graphical
65
Perception
Bad Graphs
60
Survey of
Data
Visualization
55
Techniques
One 50
Quantitative
Variable south_asia
europe_central_asia
middle_east_north_africa america
sub_saharan_africa east_asia_pacific
Multiple six_regions
Quantitative
Variables
One Qualitative
Variable
Multiple
Figure 33: Life expectancy by region, 2018.
Qualitative
Variables
Conditional 71 / 120
Plots
Histograms
Data
Visualization
Dudoit
Data
Visualization
Dudoit
Motivation middle_east_north_africa
0.10 america
Principles of east_asia_pacific
Data sub_saharan_africa
Visualization 0.08 europe_central_asia
Do We Really
Need a Graph?
south_asia
General 0.06
Considerations
Graphical
Perception 0.04
Bad Graphs
Survey of 0.02
Data
Visualization
Techniques 0.00
One
50 60 70 80 90
Quantitative
Variable
Multiple
Quantitative
Variables Figure 35: Life expectancy by region, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 73 / 120
Plots
Boxplots
Data
Visualization
Dudoit
85
Motivation
80
Principles of
Data
Visualization
75
life expectancy
Do We Really
Need a Graph? 70
General
Considerations 65
Graphical
Perception
Bad Graphs 60
Survey of 55
Data
Visualization 50
Techniques
south_asia
america
east_asia_pacific
europe_central_asia
sub_saharan_africa
middle_east_north_africa
One
Quantitative
Variable
Multiple
Quantitative
Variables Figure 36: Life expectancy by region, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional six_regions 74 / 120
Plots
Violin Plots
Data
Visualization
Dudoit
Motivation 90
Principles of
Data 80
Visualization
life expectancy
Do We Really
Need a Graph?
General
70
Considerations
Graphical
Perception 60
Bad Graphs
Survey of
Data 50
Visualization
Techniques
asia ia ca africa erica ic
One
th_ _as afri acif
Quantitative
sou tral
en o rth_ haran_ am s ia_p
Variable
e_c t_ n a t_ a
eur
op eas sub_s eas
dle_
Multiple
Quantitative
Variables Figure 37: Lifemid expectancy by region, 2018.
One Qualitative six_regions
Variable
Multiple
Qualitative
Variables
Conditional 75 / 120
Plots
Log-Transformation
Data
Visualization
Dudoit
Motivation
Principles of 120000 10
5
Data 100000
Visualization
Do We Really 80000
income
income
Need a Graph? 10
4
60000
General
Considerations
40000
Graphical
Perception
20000
Bad Graphs 3
10
0
Survey of
asia asia a a a ific asia asia a a a ific
Data th_ tral_ afric n_afric americ pac th_ tral_ afric n_afric americ pac
sou th_ sia_ sou th_ sia_
Visualization _ cen st_nor sahara t_a cen st_nor sahara t_a
ope _ea _ a s pe_ _ea _ a s
eur dle sub e euro dle sub e
Techniques mid mid
One Figure 38: Income,
six_regions 2018. Left: Income (GDP/capita,
six_regions
Quantitative
Variable
Multiple
inflation-adjusted $). Right: Log-transformed income.
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 76 / 120
Plots
Time Series
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph?
General How did life expectancy vary between 1800 and 2018?
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 77 / 120
Plots
Time Series
Data
Visualization
United States
Dudoit Russia
China
Syria
Motivation Cambodia
Principles of
Data
life expectancy
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
1800
1825
1850
1875
1900
1925
1950
1975
2000
One year
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 39: Life expectancy over time for five countries.
Variable
Multiple
Qualitative
Variables
Conditional 78 / 120
Plots
Time Series
Data
Visualization
Dudoit
80
Motivation
70
Principles of
60
Data
life expectancy
Visualization ●
●
●
50
● ●
●
Do We Really ●
●
●
●
●
Need a Graph? ● ●
●
● ●
●
●
General
40
Considerations
Graphical
Perception
30
Bad Graphs
●
Survey of
20
●
Data ●
●
Visualization
Techniques 1800 1850 1900 1950 2000
One year
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 40: Life expectancy over time.
Variable
Multiple
Qualitative
Variables
Conditional 79 / 120
Plots
One Quantitative Variable: Summary
Data
Visualization Displaying and comparing marginal distributions for
Dudoit quantitative data.
Motivation • Stem-and-leaf plots.
Principles of I Simple pen-and-paper method for visualizing the
Data
Visualization distribution of all of a handful of values.
Do We Really
Need a Graph?
I Not amenable to comparisons between distributions.
General
Considerations
I No reason to use these days.
Graphical
Perception • Stripcharts/Stripplots. (Sometimes referred to as
Bad Graphs
Survey of
dotcharts/dotplots, related to rug plots.)
Data I Effective for visualizing the distribution of all of a
Visualization
Techniques moderate number of values.
One
Quantitative
Variable
I Can use side-by-side stripplots to compare multiple
Multiple
Quantitative
distributions.
Variables
One Qualitative • Histograms.
Variable
Multiple I Classical method for displaying a single distribution.
Qualitative
Variables
Conditional 80 / 120
Plots
One Quantitative Variable: Summary
Data
Visualization
I Sensitive to bin width and bin boundaries.
Dudoit
I Cannot easily display and compare multiple distributions.
• Density plots.
Motivation
I Based on kernel density estimation (cf. smoothing).
Principles of
Data I Sensitive to bandwidth, but methods available to select
Visualization
Do We Really bandwidth.
Need a Graph?
I Effective for displaying and comparing multiple
General
Considerations
Graphical distributions.
Perception
Bad Graphs • Boxplots. (A.k.a., box-and-whiskers plots.)
Survey of
I Summarize distribution by only 5 numbers (+ outliers):
Data
Visualization Median, upper and lower-quartiles, whiskers at 1.5 times
Techniques
One inter-quartile range (IQR) above and below upper and
Quantitative
Variable
Multiple
lower-quartiles, respectively.
Quantitative I Possible loss of information, e.g., multimodality.
Variables
One Qualitative I Effective for displaying and comparing multiple
Variable
Multiple
Qualitative distributions, especially with notches.
Variables
Conditional 81 / 120
Plots
One Quantitative Variable: Summary
Data
Visualization
Dudoit
Motivation
• Violin plots.
Principles of
Data I Trendy hybrids of boxplots and density plots.
Visualization
Do We Really
I Redundant (twice the density plot!), unless plot different
Need a Graph?
General densities on each side.
Considerations
Graphical I Same limitations and issues as with boxplots and density
Perception
Bad Graphs plots.
Survey of I Cannot compare densities as readily as with standard
Data
Visualization density plots.
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 82 / 120
Plots
Multiple Quantitative Variables
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization
Do We Really
Need a Graph? How would you visually examine the relationship between
General
Considerations
Graphical
life expectancy and income in 2018 over all countries?
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 83 / 120
Plots
Scatterplots
Data
Visualization
Dudoit
Motivation
85 85
Principles of
Data 80 80
Visualization 75 75
life expectancy
life expectancy
Do We Really
Need a Graph? 70 70
General 65 65
Considerations
Graphical 60 60
Perception
Bad Graphs 55 55
Survey of 50 50
0 20000 40000 60000 80000 100000 120000 3 4 5
Data 10 10 10
income income
Visualization
Techniques
One Figure 41: Life expectancy vs. income, 2018. Left: Income
Quantitative
Variable (GDP/capita, inflation-adjusted $). Right: Log-transformed income.
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 84 / 120
Plots
Scatterplots
Data
Visualization
Dudoit
85
Motivation
80
Principles of
Data
Visualization
75
life expectancy
Do We Really
Need a Graph? 70
General six_regions
Considerations 65 south_asia
Graphical
Perception
europe_central_asia
Bad Graphs 60 middle_east_north_africa
sub_saharan_africa
Survey of 55 america
Data
Visualization
east_asia_pacific
50
Techniques 3 4 5
One 10 10 10
Quantitative
Variable
income
Multiple
Quantitative
Variables Figure 42: Life expectancy vs. income, colored by region, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 85 / 120
Plots
Bubble Charts
Data
Visualization
Dudoit
85
Motivation
80
Principles of
Data
life expectancy
75 four_regions
Visualization asia
Do We Really 70 europe
Need a Graph? africa
General
Considerations
65 americas
Graphical population
Perception 60 0
Bad Graphs 500000000
Survey of 55 1000000000
Data 1500000000
Visualization 50
Techniques 3 4 5
10 10 10
One
Quantitative
income
Variable
Multiple
Quantitative Figure 43: Life expectancy vs. income, colored by region and with
Variables
One Qualitative area of bubbles representing population, 2018.
Variable
Multiple
Qualitative
Variables
Conditional 86 / 120
Plots
Mean-Difference Plots
Data
Visualization
Dudoit
Motivation
Difference
Do We Really
Need a Graph? 70 mean diff:
5 5.87
2018
General
Considerations
60
Graphical
Perception 0
Bad Graphs 50
-SD1.96: -1.21
Survey of 40
Data
50 55 60 65 70 75 80
40 50 60 70 80 90 Means
Visualization 1998
Techniques
One
Quantitative Figure 44: Life expectancy, 2018 vs. 1998.
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 87 / 120
Plots
Scatterplot Matrices
Data
80
Visualization
life expectancy
70
Dudoit 60
50
Motivation 20
18
log pop
Principles of 16
14
Data six_regions
12
Visualization south_asia
europe_central_asia
middle_east_north_africa
Do We Really 8
sub_saharan_africa
america
Need a Graph?
log pop dens
east_asia_pacific
6
General
4
Considerations
2
Graphical
Perception
12
Bad Graphs
10
log inc
Survey of
Data 8
Visualization
60 80 10 20 0 5 10 5.0 7.5 10.0 12.5
Techniques life expectancy log pop log pop dens log inc
One
Quantitative
Variable
Multiple
Figure 45: Life expectancy, population, population density, and
Quantitative
Variables income, by region, 2018.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 88 / 120
Plots
Scatterplot Matrices
Data
Visualization
1.0
Dudoit ●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●● ●
●
●
●●
●
●
●● ●
●
● ●●
●●
● ●
● ●
●
●
● ●●
●
●
● ● ●●
●●
● ●●
●
●
●
●
● ●
●● ● ●●●
● ●●●
●●
●
● ●
● ●
●
●
●
●
●●
●
●
●●
●●
● ● ●●
●
● ●●
●
●
●●
●
●
●
●
● ●
0.6 0.8 1.0
● ● ● ● ● ●● ●● ●●● ●
● ● ● ● ● ● ●
●●
●
● ● ●
● ●●
●● ●
● ●● ● ●
●●
●
●
● ● ● ●
● ●● ● ●●● ● ●
●
● ●●
● ● 0.8
●
● ●● ● ●● ● ● ● ● ● ●● ●●● ● ●●
● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●●
● ●
● ●● ● ● ● ●● ● ● ● ●●
● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●● ●
● ● ● ●● ● ●● ● ● ●● ● ● ● ●
● ●● ● ●● ● ●
● ●●
●
●
● ●
●
●●
●●
●
●
●●
● ● ●●
● ●●
● ●
●
●
●
●
●
● ●
●● ● ● ●●
●
●
● ● ●
0.6
● ●● ●●
Motivation ●
● ● ● ● ●● ● ● ● ● ● ● ●
● ●
●● ●● ●
●
●● ●
● ● ●
●● ●● ●●
● ● ●
●
●●
●●● ●
●
●●
● ●● ●
●
● ●
●
● ●
● ● ●
●
● ● ●
●
●
●●
●● ●
● ● ●●
● ● ● ●●●● ● ●
● z
● ● ● ● ● ● ● ●● ● ● ● ●
● ●● ●●●●● ● ● ●
● ● ●● ● ● ● ●● ● ●● ●●
●
●●● ●
● ●
●
● ● ●
●
●●
●●
●●
● ●●
●● ●
● ● ●
●
●●
● ●
●● ●
●
●
● ●●●
●
● ● ●
● ● 0.4
● ● ● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●● ● ● ●
● ●●
●
● ● ● ●● ● ● ● ●●
● ● ●● ●
●
● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●
● ●● ● ● ● ●
● ● ●● ● ● ● ●● ● ● ●● ● ●
● ●●
● ● ● ● ● ●
● ● ●
●
● ●● ● ● ●●●● ● ●● ● ●● ● ● ●● ●●
Principles of
● ●
●
●
●●
● ●●● ●
●
● ●
● ● ●
●
● ●
●
●
● ● ● ●
● ●
●●● ●
●● ●
●● ●
● ●
●
● ● ●●
● ●
●●
●
0.2
● ● ●● ● ●
●● ●
● ● ● ●● ● ●
● ● ●
● ● ● ● ●● ● ● ●● ● ●
●● ● ● ●
● ● ● ●● ●
● ● ●●●● ●● ● ●
● ● ● ● ● ●
● ● ●
● ●● ●
● ●
● ●
● ●
● ●
● ● ● ●●● ● ● ● ● ● ●● ● ● 0.0 0.2 0.4
Data ● ● ● ● ●● ● ●●
●● ●
●● ● ●
● ●
● ● ● ● ●
0.0
● ● ● ● 1.0 ●●
●
Visualization ●●
● ● ● ●● ● ● ●● ●●
●
●
●●
●● ●
● ● ●
●
●●●
●
●
●●
●
● 0.6 0.8 1.0 ● ●
● ●
●
●
●●
● ●
●● ●
●● ● ● ●
●●
● ●
● ●● ● ● ● ● ●
●● ● ● ●● ● ●●● ● ● ● ●●
●
●● ● ●● ●
●
● ● ● ●●
●● ● ● ● ● ● ● ● ●
● ●● ● ● ● ● ● ● ● ●
●
● ●● ●● ● ● ●● ● ● ●
● ● ●●● ● ● ●● ●
●● ● ● ● ● ●●● ● ● ●
●
●
●● 0.8 ● ● ● ● ●● ● ●●
●●
●
●
●
●● ●●
Do We Really ●
●
●●
●
●
●● ●●●
● ●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
● ●● ● ●
●●●
● ●
●
● ● ● ●
●● ●
●●●
● ●● ● ●
● ●●
●
●● ● ●
●●
●●
● ●●
● ●
● ● ● ●
●● ● ●
●
● ● ●
●
●●
●
●
●
●
● ●
● ●
●
● ●
●
●●
●
●
Need a Graph? ●
●●
●● ●
● ●
●●
● ●
●
●●
● ●
● ●
●
●
● ●
● ● ●●
●● ●
●
●
●
●
●
● ●●
●
●
●
●●●
●
0.6
●
●
●
● ●
●
●
●
●
●
●●
●
●● ●
●
●●
●
●●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●● ● ● ●
●
●
●●●● ●● ●
●●
●● ● ● ●● ● ● ● y ●●
●
● ●● ●
●
●● ● ● ●● ● ●●●
General ●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●● ● ●
● ●
● ● ● ●●
●● ●
●
●
● ●●
● ●●
●
●●
●●
●
●● ●
●
●●
●
0.4
●●
● ●
● ●
●
●● ● ●
● ●
● ●
● ● ●●
●
●●
●●
●●● ●
●
●●
●
●
● ●
●
●●●
●
●
●
●
●
●
●
●●● ●
●●
Considerations ●
●
●
●
●●●
●
●
● ● ●●
● ●●
● ●
●
● ●
● ●
●
● ● ●
● ●
●
● ●
●
●●
● ●
●
● ●
●
● ●
●
●
●
●●
●
●
●● ● ●
●
● ●●
● ● ●
●
●
● ● ● ●●
●
●
●●
●
●
●
●
● ● ●
● ● ●
●
●●
●
●
● ● ●●
●● ●
●●●
●
●●●
●● ●●● ● ●
●
●
●●
● 0.2 ●●
●
● ● ●
●
● ●● ●
● ● ● ●
● ●● ● ● ● ● ●
Graphical ●
● ●●
●●
●
● ●
●
● ●
●●
● ●
●
●
●●
●●
●
●
●
● ●● ●
●
●
●
●●
●
●
●
●●
●
●
●
● ● ●
●
●
●●● ●
●● ● ● ●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
● ●●
●
●● ●
● ● ● ●● ●● ● ● ●● ● 0.0 0.2 0.4 ●● ● ● ● ●● ● ●●● ●
Perception ●
●
●
●
● ● ●● ● ●● ● ●●● ●
0.0 ● ● ●●
● ●● ●
●●
● ● ●● ●●
1.0 ● ● ● ● ●
●
● ●●
●● ●
●
●
●● ●
●●● ●
● ●
●●
●
● ●
●● ●
●
●●
●
●
● ● ●
● ● ●●
●
●● ●
●
● ●●
●
●●
● ● ●
●
●●● ●
●
●
●
●●
●
● ●
●
● ● ● ●
●
●
● ●
●
●●●
●
●
●●
● ●
●● ●
● ●●
●
●
● ● ● ●●●
●
●●
● ● ● ● ●● ● ● ●
● ● ● ● ●● ● ●● ●● ●
●
● ●
●●●● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ● ● ● ●● ● ●● ● ●●● ●●●
0.8 ●
●
● ●●●●● ●
● ●
●●●
●
●
●●
● ●● ● ●●
●●
●
●●
●●
● ●
●
● ●
●
●●
●
●
● ● ● ●●
● ●
●
●
●
●
●
● ● ●●
● ● ● ●● ● ● ●
Survey of ● ● ● ● ● ● ●●● ●● ● ● ● ●●
● ● ● ● ● ● ●
● ●
● ●
● ● ●●● ● ● ●
●● ●● ● ● ● ●● ●
● ● ● ●● ●● ● ● ● ● ●●● ● ● ●
● ● ●
●● ● ● ● ● ●
● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●
●
0.6 ●●●
● ● ●
● ●
● ●
●
●● ● ● ● ● ●● ●
● ● ●
●●
● ●● ●
● ●
●
●● ● ● ●
●
●●
●● ●
●●●● ● ●●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
● ● ● ● ●
Data ●
● ● ● ●● ● ● ● ● ● ●
●● ● ●●
●
● ● ●
x ●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
● ●
● ●
● ●●
●●
● ●
● ●
●
● ●
●●
● ●
●
●●
● ●
●
●
● ●
●
●
●●●
●● ●● ●
●●●
● ●
● ● ●● ● ●●● ● ●● ●
● ● ● ● ● ● ● ●● ● ●●
0.4 ● ● ● ● ●
●● ● ●
●●
● ●
● ●●
●●● ● ● ●●
● ●
● ●
● ●●
● ●
Visualization ● ●●
● ●●
● ● ●● ●
●
●●
● ● ●●
● ● ●● ● ●● ● ● ●
●
●
● ●
● ●
● ● ● ●● ● ● ●●● ●
● ●● ●
● ● ●
● ● ●
●● ● ●● ●● ●●● ● ● ● ● ●
● ●● ● ●● ● ●
●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●
● ●● ● ● ● ●
●
0.2 ●● ● ●
●●
● ● ● ●
●
● ● ● ● ● ●
●
●● ● ●●
● ●
●●
●
Techniques ●●
● ● ● ● ● ●● ●
● ●● ● ● ●● ● ●
●
● ● ● ● ●●●
● ● ●
● ● ● ● ● ● ● ● ● ●● ● ●
●
● ●
● ● ● ●●
● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ●
● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
0.0 0.2 0.4 ●
●
●● ● ●●
● ● ● ● ●
●
●● ● ●●
● ●
0.0 ● ● ● ●● ● ●
Data
Visualization
Dudoit
Motivation
y x
●
Principles of ●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
Data
● ● ●● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
●● ●
●
● ●●
● ● ● ●
● ● ●
Visualization
●
● ● ●
● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
●
● ● ● ● ●
● ● ●● ●
● ● ●
Do We Really ●
●
● ●●
●
●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
Need a Graph? ● ● ●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
● ●
●●
●
● ● ●
●
● ● ● ●
●
General ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
Considerations ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●● ●
●
z
● ●
● ●
●● ● ● ● ● ●●
Graphical ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●● ●
●●
●
●
●
●
Perception ● ●●
●
● ●
●
● ●
● ●
● ●
●
●
●
● ●
●
●
●●●
●
● ●
● ●
●
● ●
● ●● ●
●
Bad Graphs ●
●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●● ●
●● ● ● ●
● ● ● ● ● ●●● ●●
● ●
Survey of ●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●●
●
●
●
●
●
● ●
●
● ●
● ● ●
Data ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
Visualization ●
●
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Figure 47: RANDU RNG. Triples of successive numbers.
Qualitative
Variables
Conditional 90 / 120
Plots
RANDU RNG
Data
Visualization
Data
Visualization
Dudoit
Motivation 20
Principles of 18
Data
Visualization 16
Do We Really
Need a Graph? 14
General y
Considerations 12
Graphical
Perception 10
Bad Graphs
8
Survey of
Data 6
Visualization
4 6 8 10 12 14 16 18 20
Techniques x
One
Quantitative
Variable
Multiple
Quantitative
Figure 48: Simulated data, n = 60, 000: Scatterplot.
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 92 / 120
Plots
Overplotting: Hexagonal Binning
Data
Visualization
Dudoit
Motivation 20
Principles of 18
Data
Visualization 16
Do We Really
Need a Graph? 14
General y
Considerations 12
Graphical
Perception 10
Bad Graphs
8
Survey of
Data 6
Visualization
4 6 8 10 12 14 16 18 20
Techniques x
One
Quantitative
Variable
Multiple
Quantitative
Figure 49: Simulated data, n = 60, 000: Hexagonal binning.
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 93 / 120
Plots
Overplotting: Scatterplot Smoothing
Data
Visualization
Dudoit
22.5
Motivation
20.0
Principles of
Data 17.5
Visualization
Do We Really 15.0
Need a Graph? y
General 12.5
Considerations
Graphical 10.0
Perception
Bad Graphs
7.5
Survey of
Data 5.0
Visualization
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Techniques x
One
Quantitative
Variable
Multiple
Quantitative
Figure 50: Simulated data, n = 60, 000: Scatterplot smoothing.
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 94 / 120
Plots
Multiple Quantitative Variables: Summary
Data
Visualization
Displaying joint distributions for quantitative data.
Dudoit
• While density plots and boxplots are useful for comparing
Motivation
two or more marginal distributions (e.g., in terms of
Principles of
Data location and scale), they do not provide any information
Visualization
Do We Really about joint distributions and, in particular, associations
Need a Graph?
General
Considerations
between two variables.
Graphical
Perception • Scatterplots and scatterplot matrices.
Bad Graphs
Survey of
I Useful for examining linear association between two
Data variables.
Visualization
Techniques I Can extend beyond two variables by using color and
One
Quantitative plotting symbol area, as in bubble charts.
Variable
Multiple
I However, can miss important higher-dimensional patterns
Quantitative
Variables (cf. RANDU example).
One Qualitative
Variable
Multiple
• Mean-difference plots.
Qualitative
Variables
Conditional 95 / 120
Plots
Multiple Quantitative Variables: Summary
Data
Visualization I Rotated and scaled version of scatterplot.
Dudoit I Better for looking at differences vs. associations.
Motivation • Bubble charts. A bubble chart is a type of scatterplot that
Principles of
Data
displays one or two extra dimensions using area and color.
Visualization
Do We Really
• Parallel coordinates plots.
Need a Graph?
General
I Natural for visualizing time series data, i.e., same variable
Considerations
Graphical measured across time.
Perception
Bad Graphs Cf. Train schedules.
Survey of I Can also be used for visualizing the relationship between
Data
Visualization multiple variables, but trickier: Each line corresponds to an
Techniques
One
observation and each axis to a variable.
Quantitative
Variable
I Three important considerations, that can affect
Multiple
Quantitative interpretation of the plot: The order, the rotation, and the
Variables
One Qualitative scaling of the axes.
Variable
Multiple
Qualitative
Variables
Conditional 96 / 120
Plots
Overplotting
Data
Visualization
Dudoit
Motivation
Data
Visualization How would you visualize the 2017 UK election results?
Dudoit
Number of seats for each of 13 parties.
Motivation
Party MPs
Principles of
Data 0 CON 318
Visualization 1 LAB 261
Do We Really
Need a Graph? 2 SNP 35
General
Considerations
Graphical
3 LIB DEM 12
Perception
Bad Graphs
4 DUP 10
Survey of
5 SF 7
Data 6 PC 4
Visualization
Techniques 7 GREEN 1
One
Quantitative 8 IND 1
Variable
Multiple 9 OTHER 1
Quantitative
Variables 10 UKIP 0
One Qualitative
Variable 11 SDLP 0
Multiple
Qualitative
Variables
12 UUP 0
Conditional 98 / 120
Plots
Pie Charts
Data
Visualization
Dudoit
CON
Motivation
Principles of 48.9%
Data
Visualization
Do We Really
Need a Graph? 0.0%
0.2%
0.6%
1.1% UKIP
SDLP
UUP
OTHER
IND
GREEN
PC
General 1.5% SF
Considerations
1.8% DUP
5.4% LIB DEM
Graphical
Perception 40.2% SNP
Bad Graphs
Survey of
Data
LAB
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative Figure 51: UK Election Results 2017. Number of seats for each of 13
Variables
One Qualitative parties.
Variable
Multiple
Qualitative
Variables
Conditional 99 / 120
Plots
Barplots
Data
Visualization
Dudoit
Motivation 300
Principles of 250
Data
Visualization
200
Do We Really
MPs
Need a Graph?
General 150
Considerations
Graphical
Perception 100
Bad Graphs
Survey of 50
Data
Visualization 0
Techniques N B P P F C N D R P P P
One CO LA SN IB DEMDU S P GREE IN OTHE UKI SDL UU
Quantitative L
Variable Party
Multiple
Quantitative Figure 52: UK Election Results 2017. Number of seats for each of 13
Variables
One Qualitative parties.
Variable
Multiple
Qualitative
Variables
Conditional 100 / 120
Plots
Dotplots
Data
Visualization
Dudoit
CON
Motivation LAB
SNP
Principles of LIB DEM
Data
Visualization DUP
Do We Really SF
Party
Need a Graph? PC
General
Considerations GREEN
Graphical IND
Perception
Bad Graphs
OTHER
UKIP
Survey of SDLP
Data
UUP
Visualization
Techniques 0 50 100 150 200 250 300
One MPs
Quantitative
Variable
Multiple
Quantitative Figure 53: UK Election Results 2017. Number of seats for each of 13
Variables
One Qualitative parties.
Variable
Multiple
Qualitative
Variables
Conditional 101 / 120
Plots
Lollipop Plots
Data
Visualization
Dudoit
Motivation
300
Principles of 250
Data
Visualization 200
Do We Really
Need a Graph?
General
150
Considerations
Graphical 100
Perception
Bad Graphs
50
Survey of
Data 0
Visualization
Techniques
CON
DUP
LAB
SNP
SF
PC
GREEN
IND
OTHER
UKIP
LIB DEM
SDLP
One
Quantitative
Variable
Multiple
Quantitative Figure 54: UK Election Results 2017. Number of seats for each of 13
Variables
One Qualitative parties.
Variable
Multiple
Qualitative
Variables
Conditional 102 / 120
Plots
One Qualitative Variable: Summary
Data
Visualization
• Pie charts.
Dudoit
I Frequency represented by angle/area.
Motivation I Angles and areas are hard to perceive and compare.
Principles of I Pie charts quickly become unreadable for more than a
Data
Visualization handful of values.
Do We Really
Need a Graph? I Listing the values is often better – they are actually often
General
Considerations added to a pie chart anyway!
Graphical
Perception I How to select order of categories?
Bad Graphs
I Not amenable to comparing distributions; side-by-side
Survey of
Data comparisons not effective.
Visualization
I Hard to extend to multiple variables.
Techniques
One I A lot of junk often added to pie charts, e.g., thickness,
Quantitative
Variable
Multiple slice explosion.
Quantitative
Variables
One Qualitative
• Wordclouds/tag clouds.
Variable
I Frequency represented by font size.
Multiple
Qualitative
Variables
Conditional 103 / 120
Plots
One Qualitative Variable: Summary
Data
Visualization
I Neither area nor height corresponds to frequency of words.
Dudoit
I How do longer words compare with shorter words?
I How are capital letters handled?
Motivation
I How to calculate relative difference in frequency between
Principles of
Data
two words?
Visualization I How are the words ordered within the cloud (alphabetical,
Do We Really
Need a Graph? frequency)?
General
Considerations I Not amenable to comparing distributions; side-by-side
Graphical
Perception
Bad Graphs
comparisons not effective.
Survey of
I How to extend to multiple variables?
Data I A lot of junk often added to word clouds.
Visualization
Techniques • Barcharts/barplots.
One
Quantitative I Based on length and position on common aligned scale.
Variable
Multiple I Add an irrelevant dimension (thickness of bar).
Quantitative
Variables
One Qualitative I How to select order of categories?
Variable
Multiple I Not readily amenable to comparisons.
Qualitative
Variables
Conditional 104 / 120
Plots
One Qualitative Variable: Summary
Data
Visualization I Extension to multiple variables problematic.
Dudoit
• Dotcharts/dotplots. (And interval charts.)
Motivation I Based on length and position on common aligned scale.
Principles of I Display only the relevant information.
Data
Visualization I How to select order of categories?
Do We Really
Need a Graph? I More amenable to comparisons and extensions to multiple
General
Considerations variables.
Graphical
Perception
Bad Graphs
• Lollipop plots.
Survey of I Similar to dotcharts/dotplots (with added stem) and
Data
Visualization barcharts/barplots.
Techniques I Stem is redundant.
One
Quantitative I How to select order of categories?
Variable
Multiple I Not readily amenable to comparisons.
Quantitative
Variables
I Extension to multiple variables problematic.
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 105 / 120
Plots
Multiple Qualitative Variables
Data
Visualization
Dudoit
Motivation
How would you display survival data on the Titanic
Principles of
Data according to class, gender, and age?
Visualization
Do We Really
Need a Graph?
General
Considerations
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 106 / 120
Plots
Barplots
Data
Visualization
Dudoit
Motivation
350
No Yes
Yes No
Principles of
400
300
Data
Visualization
250
Do We Really
300
Need a Graph?
200
General
Considerations
Graphical
200
150
Perception
Bad Graphs
100
Survey of
100
Data
50
Visualization
Techniques
0
0
One First Second Third First Second Third
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative Figure 55: Titanic: Survival by class.
Variable
Multiple
Qualitative
Variables
Conditional 107 / 120
Plots
Dotplots
Data
Visualization
Dudoit
Motivation
Principles of
Third ●
Data
Visualization
Do We Really Second ●
Need a Graph?
General
Considerations
Graphical
Perception
First ●
Bad Graphs
Survey of
Data
Visualization
Techniques 0.3 0.4 0.5 0.6
One
Quantitative
Variable
Multiple
Quantitative
Survival frequency per class
Variables
One Qualitative
Variable
Multiple Figure 56: Titanic: Survival by class.
Qualitative
Variables
Conditional 108 / 120
Plots
Dotplots
Data
Visualization
Dudoit
Motivation
woman ●
Principles of
Data
Visualization
Do We Really man ●
Need a Graph?
General
Considerations
Graphical
Perception
child ●
Bad Graphs
Survey of
Data
Visualization
Techniques 0.2 0.4 0.6
One
Quantitative
Variable
Multiple
Quantitative
Survival frequency per gender/age
Variables
One Qualitative
Variable
Multiple
Figure 57: Titanic: Survival by gender/age.
Qualitative
Variables
Conditional 109 / 120
Plots
Dotplots
Data
Visualization
Dudoit
child
Third ●
Motivation Second ●
Principles of
First ●
Data
man
Visualization
Third ●
Do We Really Second ●
Need a Graph?
General
First ●
Considerations
Graphical woman
Perception Third ●
Bad Graphs Second ●
Survey of
First ●
Data
Visualization
Techniques 0.2 0.4 0.6 0.8 1.0
One
Quantitative
Variable
Multiple
Quantitative
Survival frequency per class and gender/age
Variables
One Qualitative
Variable
Multiple
Figure 58: Titanic: Survival by class and gender/age.
Qualitative
Variables
Conditional 110 / 120
Plots
Mosaic Plots
Data no yes
Visualization
First
Dudoit
Second
Motivation
Principles of
Data
class
Visualization
Do We Really
Need a Graph?
General
Considerations
Third
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques alive
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 59: Titanic: Survival and class.
Variable
Multiple
Qualitative
Variables
Conditional 111 / 120
Plots
Mosaic Plots
Data child
no
man woman child man
yes
woman
Visualization
First
Dudoit
Second
Motivation
Principles of
Data
class
Visualization
Do We Really
Need a Graph?
General
Considerations
Third
Graphical
Perception
Bad Graphs
Survey of
Data
Visualization
Techniques alive
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 60: Titanic: Survival, class, and gender/age.
Variable
Multiple
Qualitative
Variables
Conditional 112 / 120
Plots
Multiple Qualitative Variables: Summary
Data
Visualization The following types of plots are used to represent conditional
Dudoit
distributions for multiple categorical variables or counts for
Motivation hierarchical categories.
Principles of • Multilevel donut/pie/sunburst plots.
Data
Visualization I Same or worse perception issues as with univariate pie
Do We Really
Need a Graph? charts.
General
Considerations I Which variable to choose for “outer” layer?
Graphical
Perception
Bad Graphs • Barcharts/barplots.
Survey of I For two categorical variables, a barchart/barplot displays
Data
Visualization the counts (or percentages) for each category of the
Techniques
One
second variable within each category of the first variable.,
Quantitative
Variable i.e., conditional distribution of second variable given first.
Multiple I Which variable to choose as “first”?
Quantitative
Variables
One Qualitative
I In a side-by-side barplot, the frequencies for the second
Variable
Multiple variable are displayed as juxtaposed bars.
Qualitative
Variables
Conditional 113 / 120
Plots
Multiple Qualitative Variables: Summary
Dudoit
variable are staked, so that their total height is the total
count for the category of the first variable or 100 percent.
Motivation I Hard to compare frequencies between categories of first
Principles of variable with both types of barplots.
Data
Visualization I Hard to compare frequencies of second variable within
Do We Really
Need a Graph? categories of first variable with stacked barplot.
General
Considerations I Circular barcharts/barplots: Eye-catching, but even harder
Graphical
Perception to compare frequencies.
Bad Graphs
Data
Visualization
Dudoit
I Color and shading of the tiles can be used to represent
Motivation
unusually large or small counts, the sign and magnitude of
Principles of
Data residuals (deviations) for particular models (e.g.,
Visualization
Do We Really
independence).
Need a Graph?
General
I For two categorical variables, the width of each tile is
Considerations
Graphical
proportional to the marginal frequency of the category for
Perception
Bad Graphs
the first variable and the height of the tile to the
Survey of conditional frequency of the category for the second
Data
Visualization
variable given the first.
Techniques I Can be hard to read mosaic plots for more than two
One
Quantitative variables.
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 115 / 120
Plots
Conditional Plots
Data
Visualization Conditional plots/coplots/faceting/panels/small multiples.
Dudoit
• Collection of plots, where each plot represents the
Motivation conditional distribution of one or more variables given a
Principles of
Data
conditioning variable.
Visualization
Do We Really • Each plot corresponds to a value or set of values for the
Need a Graph?
General
Considerations
conditioning variable. For a quantitative conditioning
Graphical
Perception
variable, the ranges are typically chosen so that there are
Bad Graphs
equal numbers of observations in each panel.
Survey of
Data • The scales on the axes have to be the same for all panels.
Visualization
Techniques
One
• The colors (and legends) also have to be the same for all
Quantitative
Variable panels.
Multiple
Quantitative
Variables • E.g. Scatterplots of life expectancy vs. income for each of
One Qualitative
Variable
Multiple
the six world regions.
Qualitative
Variables
Conditional 116 / 120
Plots
Conditional Plots
Data
Visualization
85
●
Dudoit ●
●
●
●
●
● ●
● ●
● ●
● ● ●
●
● ● ●
●
● ●
80 ● ●
●
●
●
Motivation ●
●
●
●
●
●
●
● ●
● ●
● ● ● ●
●
●
● ●
●
●
● ● ●
●
● ● ●
● ●
Principles of 75 ● ●
●
● ●
● ●
●
●
● ● ●
Data ●
●
●
●
● ●
●
●
●
●
● ●
life.expectancy
●
Visualization 70
●
●
●
●
●
●
●
● ●
●
●
Do We Really ● ● ●
●
● ●
●
●
●
Need a Graph? ●
●
●
●
●
●
●
General 65 ● ●
●
●
● ●
●
Considerations ●
●
●
●
●
●
Graphical ● ●
●
●
Perception 60 ●
●
●
● ●
●
Bad Graphs
55
Survey of
Data ●
●
Visualization 50
Techniques
america
east_asia_pacific
europe_central_asia
middle_east_north_africa
south_asia
sub_saharan_africa
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Figure 61: Life expectancy by region, 2018.
Variable
Multiple
Qualitative
Variables
Conditional 117 / 120
Plots
Conditional Plots
Dudoit
Motivation
Principles of amrc es__ er__ m___ sth_ sb__ amrc es__ er__ m___ sth_ sb__
85
Data ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
80
● ● ● ●
● ● ● ● ●
Visualization ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
75
● ● ● ●
● ● ● ●
●
● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
Do We Really ● ● ●
70
● ●
●
● ●
● ●
Need a Graph? ●
●
●
●
●
●
65
●
General
60
life.expectancy
Considerations
55
Graphical
50
85
Perception
80
● ● ●
Bad Graphs ●
●
●
●
●
●
●
●
●
●
●
●
75
● ● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ● ●
70
● ● ● ● ● ●
Survey of ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
65
●
● ●
● ●
● ● ● ●
Data ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
60
●
● ●
● ●
● ● ●
●
Visualization
55
Techniques ●
● ●
50
Data
Visualization
• Peter Aldhous. Data visualization: basic principles. http:
Dudoit
//paldhous.github.io/ucb/2016/dataviz/week2.html.
Motivation
• Ross Ihaka. Statistics 120 – Information Visualisation.
Principles of
Data https://www.stat.auckland.ac.nz/~ihaka/120/.
Visualization
Do We Really
Need a Graph?
• Duncan Temple Lang. Data Visualization Workshops.
General
Considerations http://dsi.ucdavis.edu/tag/data-visualization.html.
Graphical
Perception
Bad Graphs
W. S. Cleveland and R. McGill. Graphical perception and graphical
Survey of
Data methods for analyzing scientific data. Science, 229(4716):828–833,
Visualization 1985.
Techniques
One
Quantitative
A. Gelman, C. Pasarica, and R. Dodhia. Lets practice what we preach:
Variable
Multiple
Turning tables into graphs. The American Statistician, 56(2):121–130,
Quantitative
Variables
2002.
One Qualitative
Variable E. J. Marey. La Mthode Graphique. Librairie de l’Académie de Médecine,
Multiple
Qualitative 1885.
Variables
Conditional 119 / 120
Plots
References
Data
Visualization
Dudoit
Motivation
Principles of
Data
Visualization S. S. Stevens. On the psychophysical law. Psychological Review, 64(3):
Do We Really
Need a Graph? 153–181, 1957.
General
Considerations
Graphical
E. R. Tufte. The Visual Display of Quantitative Information. Graphics
Perception
Bad Graphs
Press, 2nd edition, 2001.
Survey of
Data
Visualization
Techniques
One
Quantitative
Variable
Multiple
Quantitative
Variables
One Qualitative
Variable
Multiple
Qualitative
Variables
Conditional 120 / 120
Plots