You are on page 1of 92

Introduction to Data Science (IDS) course

Visual Analytics &


Information Visualization
Lecture 18
Outline of Today’s Lecture
• Visual Analytics
• Scope and Process
• Examples
• Challenges
• Advanced visualization techniques
• User tasks
• Visual encoding principles Visual Data Mining
Theory, Techniques and
Visualization Analysis and
Design: Principles,
Tools for Visual Analytics Techniques, and Practice
by Simeon Simoff, by Tamara Munzner.
Michael Böhlen, Arturas
Mazeika.

© PADS (use only with permission & acknowledgements)


Introduction

© PADS (use only with permission & acknowledgements)


Introduction

• Lots of (Big) data, but raw data, has no value, only extracted information
has.
• Not just known unknowns, also unknown unknowns.
• Visual exploration is a vital first step (understanding, spotting data quality
problems, building trust, etc.).
• Combination of visualization and automated learning is often superior.

© PADS (use only with permission & acknowledgements)


Seen before: Anscombe's quartet

Explore your data first!

© PADS (use only with permission & acknowledgements)


Charles Minard's 1869 chart showing the number of men in Napoleon’s
1812 Russian campaign army, their movements, as well as the
temperature they encountered on the return path.

© PADS (use only with permission & acknowledgements)


Classical Visualization Examples

John Snow's map of the 1854 cholera


John Snow (1813-1858) was an English physician and considered one of the fathers of modern
epidemiology. He traces the source of a cholera outbreak in Soho, London, in 1854. His
findings inspired fundamental changes in the water and waste systems of London. outbreak in Soho, London
© PADS (use only with permission & acknowledgements)
Classical Visualization Examples

Étienne-Jules
Marey 1885.
TGV today
(less than 2 hours)

© PADS (use only with permission & acknowledgements)


Classical Visualization Examples

New York Times 1981, showing 1888 numbers!


© PADS (use only with permission & acknowledgements)
Classical Visualization Examples

music-timeline.appspot.com
© PADS (use only with permission & acknowledgements) Based on album and artist statistics aggregated from Google Play Music
Classical Visualization Examples

dotted chart showing


6000 events with
• x-axis = time
• y-axis = case
© PADS (use only with permission & acknowledgements)
Classical Visualization Examples
Misleading magnitudes.

© PADS (use only with permission & acknowledgements)


Classical Visualization Examples

Same problems

© PADS (use only with permission & acknowledgements)


The trick

𝒉𝒉 𝟐𝟐𝟐𝟐

𝝅𝝅𝒓𝒓𝟐𝟐 𝒉𝒉

𝝅𝝅(𝟐𝟐𝟐𝟐)𝟐𝟐 𝟐𝟐𝟐𝟐 = 𝟖𝟖𝝅𝝅𝒓𝒓𝟐𝟐 𝒉𝒉


© PADS (use only with permission & acknowledgements) h = height, r = radius
Complimenting each other

visualization

(un)supervised
learning

© PADS (use only with permission & acknowledgements)


Visual Analytics is more than Visualization
Classical Visualization
• Usually either handles natural geometric
structures (e.g. MRI data) or abstract data
structures (e.g. trees and graphs)
• Solely focuses on the representation of the
data
• No to little interaction possibilities
• Not combing data from different sources
• Not supported by learning techniques
• Main goals: presentation, confirmatory
analysis or exploratory analysis

© PADS (use only with permission & acknowledgements)


Visual Analytics is more than Visualization
Visual Analytics
• Basic idea:
• Handles heterogeneous data from multiple
sources
• Combining automated learning with human skills
to find relations and draw conclusions
• Benefits from information visualization in
combination with human interaction
“Turn information
• Advantages:
overload into an
• Provides overview of the data opportunity”
• Provides possibility to explore the data
• Reduces complexity of certain tasks
• Helps to make decisions in time-critical situations

© PADS (use only with permission & acknowledgements)


Need both!

© PADS (use only with permission & acknowledgements)


Remember: Adversarial ML Examples

Sophisticated
learning techniques
can be amazingly
stupid!

© PADS (use only with permission & acknowledgements)


Remember: Process Discovery

Visualization cannot be
used to find concurrency
in processes.

We see weekly patterns and drifts, but cannot see the process.

© PADS (use only with permission & acknowledgements)


Remember: Process Discovery

We see the process, but cannot see weekly patterns and drifts.

© PADS (use only with permission & acknowledgements)


Visual Analytics
Scope and Tasks

© PADS (use only with permission & acknowledgements)


Visual Analytics - Scope
Visual Analytics combines
many different fields

D. Keim et. al, Visual Analytics Scope and


Challenges in: Visual Data Mining, Springer, 2008

© PADS (use only with permission & acknowledgements)


Visual Analytics - Process
Key idea: combine visualization
with model learning

• Overall goal is to obtain


knowledge from the given data
• The data needs to be
preprocessed/transformed to
be suitable for the analysis
(see previous lecture)
• The models are obtained by
applying mining/learning
techniques to the data which
can then be refined based on
the visualization
• The visualization provides
insights and interaction
possibilities to the user

D. Keim et. al, Solving problems with visual


© PADS (use only with permission & acknowledgements) analytics, Eurographics Association, 2010
Visual Analytics - Process
To obtain the best results a steady
interchange between automated data
mining techniques and manual user
interactions is required

“Analyze First – Show the


Important – Zoom, Filter
and Analyze Further – D. Keim et. al, Solving problems with visual
analytics, Eurographics Association, 2010

Details on Demand”
© PADS (use only with permission & acknowledgements)
Visual Analytics
Examples

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples
• The Terascale Supernova
Initiative Project runs
complex simulations to
gain a complete
understanding of core
collapse supernovas
• Complex relationships
inside the supernova (e.g.
rotation, radiation,
magnetic fields etc.)
produce tens of terabytes
of data per simulation
• Data is only
understandable after
visualizing it Visual Design and Analysis: Abstractions, Principles and Methods by T. Munzner

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples
• Visual performance
analysis of financial
data
• Small triangle shows
the absolute
performance of one
stock and the big one
compares one stock
to the entire market
• Red represents
negative growth and
green represents
positive growth

D. Keim et. al, Visual Analytics Scope and Challenges in: Visual Data Mining, Springer, 2008

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples
buy in 2001 buy in 2001
sell in 2003 sell in 2003
with loss with a tiny
profit

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples

worse better
than rest than rest

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples

but worse
than rest positive

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples
• Website to visualize the daily
development of companies
and sectors on the stock
market with several interaction
options
• Red color indicates bad
performance and green good
performance
• In this example: Size of each
sector is determined by sales
volume and each sector is split
further into colored rectangles
which correspond to sub
segments of the sector
https://www.maxblue.de/maerkte-analysen/boersen-kurse/marketmap.html

© PADS (use only with permission & acknowledgements)


Visual Analytics - Examples

Same data different view


© PADS (use only with permission & acknowledgements) https://www.maxblue.de/maerkte-analysen/boersen-kurse/marketmap.html
Visual Analytics - Examples
Events have a location,
a type, and a timestamp.

• VisAware for BioWatch system detects


biological and chemical attacks in the US
• Different categories of biological and
chemical agents are shown in the outer
colored ring
• Concentric rings represent sequential
time samples and show the presence of
the agent over time with the outer rings
being older time samples
• The map shows where sensors are set up
• The width of the connecting line between
sensor location and time ring shows the
probability of an actual attack

D. Keim et. al, Visual Analytics Scope and Challenges in:


Visual Data Mining, Springer, 2008
Visual Analytics - Examples

now
time

911 calls
Visual Analytics - Examples
https://www.economist.com/briefing/2015/09/12/strangers-in-strange-lands

Sankey diagram
immigration data 2015

Suggest a flow, but


strictly speaking no
time order.

Compare with Charles Minard's map


of the 1812 French invasion of Russia
Visual Analytics - Examples
Sankey diagram energy
flow building

http://www.sankey-diagrams.com/tag/building/
Visual Analytics - Examples
Sankey diagram plastic
streams 2015

https://www.plasticsrecyclers.eu/sites/d
efault/files/PRE_blueprint%20packaging
%20waste_Final%20report%202017.pdf
Periodic Table of Visualization Methods
https://www.visual-literacy.org/periodic_table/periodic_table.html
Visual Analytics
Challenges

© PADS (use only with permission & acknowledgements)


Visual Analytics - Challenges
• Meaningful: Visualization should have a purpose (and not
just look nice)
• Clear: Visualization should be interpretable (and not need a
long explanation)
• Correct: Visualization should be based on correct data and
not mislead (e.g., scaling problems)
• Actionable: Convert insights into actions
• Scalable: Still work with real-life data sets (human/machine)
• Acceptable: People need to accept and actually use the
visualizations provided
© PADS (use only with permission & acknowledgements)
Integration of Different Disciplines
• Most fields in the scope of visual
analytics, when considered by
themselves, are represented by a
certain system or tool
• To achieve the visual analytics
goal all these topics need to be
integrated into one visual
analytics tool/system
• Designing and implementing D. Keim et. al, Visual Analytics Scope and
such an integrated environment Challenges in: Visual Data Mining, Springer, 2008

is challenging
© PADS (use only with permission & acknowledgements)
Advanced Visualization
Techniques

© PADS (use only with permission & acknowledgements)


Advanced visualization techniques
In Lecture 2 we have already seen basic visualization techniques:

But they might not be enough for advanced analysis.

© PADS (use only with permission & acknowledgements)


Advanced visualization techniques
Especially for more complex datasets and data types we need
different visualization techniques

We focus on tabular data in this lecture!


© PADS (use only with permission & acknowledgements)
Advanced visualization - Scatter plot matrix
Scatter plot
• Shows a scatter plot for each matrix for
iris data set
combination of attributes in a using all
four
dataset with multiple attributes attributes
• Useful to detect correlations,
trends and outliers
• Advantage: Easier to interpret
and understand than 3-D
scatter plot
• Disadvantage: Quickly grows in 3D Scatter plot
matrix for iris
size as the number of attributes data set using
three
in the data set grows attributes

© PADS (use only with permission & acknowledgements)


Advanced visualization - Parallel coordinate plot

• Display each attribute as


parallel axes
• Plot the data points for each
attribute on the corresponding
axis
• Connect all points belonging to
one data entry

A B C
50 100 2.0
30 115 0.5

https://datavizcatalogue.com/methods/parallel_coordinates.html

© PADS (use only with permission & acknowledgements)


Advanced visualization - Parallel coordinate plot

• Interpretation
• Positive Correlation: Cluster
of similar lines
• Negative Correlation: No
cluster but similar crossing
point of axis
• Outliers: Lines isolated from
clusters and lines with
different paths

© PADS (use only with permission & acknowledgements)


Advanced visualization - Parallel coordinate plot

• Can display a large number of


data entries
• Grows rapidly in size as the
number of attributes in the data
set grows
• Colors can be configured
• Challenge: How to arrange the
variables next to each other?
• Order does not need to respect
time / flow (like in a process
model).

© PADS (use only with permission & acknowledgements)


Advanced visualization – Sankey Diagrams
Like the parallel coordinate plot for categorical data (grouping lines)

© PADS (use only with permission & acknowledgements)


Advanced visualization - Polar area chart

• Variation of the classic pie chart


• Different to pie charts each
segment has the same angle
• The radius of the segment
displays the value of the attribute

Red Orange Yellow Green Blue


82 95 33 39 33
Advanced visualization – Stream graphs
2014 FIFA World Cup

• Displays the change in the data


over time of different categories
• Plots one ordered key attribute on
the x-axis
• Usually time Stream graph visualizing the balance of play for the
world cup match between Spain and the
• Each segment of the stream Netherlands.
represents one category Red area: The strength of the Spanish team.
• Width of stream at point x is Blue area: The strength of the Dutch team.
White line: Balance of the game between both
determined by the value of the teams.
category at time x M. Grootjans, Visualization of spatio-temporal soccer data,
2014, Master thesis at Eindhoven University of Technology
• Additional discrete events can be
shown (card, goal, free kick, etc.)
Advanced visualization – Stream graphs

• Advantages:
• Good for summarizing big datasets
• Good to discover trends and
patterns over time (seasonal
changes etc.) Stream graph visualizing the balance of play for the
• Disadvantages: world cup match between Spain and the
Netherlands.
• Might get very cluttered with large
data sets Red area: The strength of the Spanish team.
Blue area: The strength of the Dutch team.
• Categories with small values are White line: Balance of the game between both
difficult to spot teams.

• Not possible to obtain precise M. Grootjans, Visualization of spatio-temporal soccer data,


2014, Master thesis at Eindhoven University of Technology
values for the categories from the
stream
Advanced visualization - Heatmap

• Displays the data set by plotting


two categorical attributes on the
axis and coloring the segments
according to a quantitative
attribute
• Used to find clusters and outliers
• Is there a meaningful order for the
categorical values on the x-and y-
axis?

https://datavizcatalogue.com/methods/heatmap.html
Advanced visualization - Node-link tree

• Used to represent hierarchical


data
• Data points are shown as nodes
• Hierarchical relations are shown as
edges
• Root on top and all branches
below
• Very effective in visualizing tree
structures
• Tree can get very wide quickly
• Low level data points can create
a lot of unused white space
Advanced visualization – Radial node-link tree

• Radial variation of the node-link


tree
• Root in the middle and the
hierarchy spreads around it in a
radial layout

http://bl.ocks.org/mbostock/4063550
Advanced visualization – Sunburst

• Similar principle to the radial


node tree
• Each ring is a different level of
hierarchy with the center being the
root
• Each segment of a ring belongs to
one categorical value
• The size of a segment is either
divided equally under the parent
node or proportional to a value

https://datavizcatalogue.com/methods/sunburst_diagram.html
Advanced visualization – Tree map

• Displays a tree as nested


rectangles
• Each parent node of the tree is
presented as a rectangle
• Each rectangle is divided into
smaller rectangle according to
the parent’s child nodes
• The size of the rectangles is
determined by a chosen attribute
of the data set

Tree map showing Elderly population in Europe per country and its
regions. The size of the rectangle represents the total population
and the colour shows the amount of people above 65 years.
https://ncva.itn.liu.se/education-geovisual-analytics/treemap?l=en
User interaction

© PADS (use only with permission & acknowledgements)


User interaction

• As mentioned before, the user


should be able to interact with
the presented data
• Three categories of user
interaction
• Manipulate (visualization, not data)
• Facet (exploit dimensions, side-by-side)
• Reduce (change the data)

T. Munzner, Visual Design and Analysis: Abstractions, Principles and Methods


User interaction – Manipulate
(visualization, not data)

• Manipulate the visualization to


gain meaningful and relevant reorder, sort,
realign, change
insights parameter, etc.
• Change the view, parameter, etc.
• Select certain parts of the data highlight
• Navigate through the data instances
directly or based
• In comparison to the “reduce” on attributes
category these interactions just
manipulate (e.g., highlighting, zoom, pan, slice,
etc.
reordering) what is presented (the
view) instead of the data behind it.
• No data is removed/transformed.
User interaction – Manipulate
(visualization, not data)

zoom based on two attributes


User interaction – Manipulate
(visualization, not data)

change view
User interaction – Manipulate
(visualization, not data)
User interaction – Facet
(exploit dimensions, side-by-side)
Linked highlighting
• Obtain a different view of the data or views (e.g., pie
to be able to present/compare chart and process
model or
relevant parts of the data geographic map
• Juxtapose: Putting two views next and bar chart).
to each other
• Partition: Split the view
• Superimpose: Overlay two views Split instances.

Different elements
are joined in layers
(e.g., locations of
vehicles and
locations of people)
User interaction – Facet
(exploit dimensions, side-by-side)

Juxtapose: Putting two views


next to each other
User interaction – Reduce
(change the data)

• Reduce the data and consequently


Select time window,
alter its visualization to only value range, region,
display the relevant part and the etc.
correct level of detail
• Filter
• Aggregate
Change scale,
granularity, take
mean/sum, fit
parameter, PCA, etc.
User interaction – Reduce
(change the data)
User interaction – Reduce
(change the data)
Visual encoding
principles

© PADS (use only with permission & acknowledgements)


Wrong visualization choices can
influence what the user perceives
Tile A and B have the same colour

Checker shadow illusion by Edward H. Adelson (1995)

© PADS (use only with permission & acknowledgements)


Wrong visualization choices can
influence what the user perceives

© PADS (use only with permission & acknowledgements)


Wrong visualization choices can
influence what the user perceives
Both blue dots have the same size

Ebbingshaus illusion by Hermann Ebbinghaus (1901)

© PADS (use only with permission & acknowledgements)


Wrong visualization choices can
influence what the user perceives
Colour blind people might not be able to read the displayed
number

Ishihara color test by Shinobu Ishihara (1917)

© PADS (use only with permission & acknowledgements)


Visual encoding principles
• There are different channels
we can use to display our
data points and differences
between them
• Position
• Shape
• Color
• Tilt
• Size

T. Munzner, Visual Design and Analysis:


Abstractions, Principles and Methods

© PADS (use only with permission & acknowledgements)


Visual encoding principles
Magnitude Channels Identity Channels
• The definition of the
channels can be refined and
grouped into two groups
• Magnitude channels
• Identity channels
• Magnitude channels should be
categorical/unordered
used for ordered attributes
• Identity channels should be
used for categorical attributes

numeric/ordered
© PADS (use only with permission & acknowledgements)
Visual encoding principles
Best
• The channels can be ranked
in their effectiveness of
expressing differences
• Rule of thumb: encode most
important attributes with
highest ranked channel

Least
© PADS (use only with permission & acknowledgements)
Visual encoding principles
• Which plot visualizes the difference between its data points better?

T. Munzner, Visual Design and Analysis: Abstractions, Principles and Methods

© PADS (use only with permission & acknowledgements)


Visual encoding principles
• Which colored dot belongs to which state? The colors are not distinguishable.

© PADS (use only with permission & acknowledgements)


Conclusion

© PADS (use only with permission & acknowledgements)


Short summary of lecture
• Visualization is a good way to display relations in the data
• Combined with manual interaction and human perception skills
insights into the data can be gained quicker
• Visual analytics aims to support the problem solving and
decision making process
• But it’s not as easy as it seems
• What to display?
• How to display it? Which visualization technique is appropriate?
• Does the visualization influence the way results are interpreted?

© PADS (use only with permission & acknowledgements)


Relevant Literature

• Visual Analytics Scope and Challenges by D. Keim et.


al in: Visual Data Mining
• Mastering the Information Age – Solving Problems
with Visual Analytics by D. Keim et. al
• Visual Design and Analysis: Abstractions, Principles
and Methods by T. Munzner

© PADS (use only with permission & acknowledgements)


Remaining

infrastructure analysis effect


“volume and velocity” “extracting knowledge” “people, organizations, society”

Decision trees
Regression Visual analytics &
Data preprocessing
Support vector machines information visualization
Neural networks
Big data infrastructures Association rules Responsible data science
Sequence mining (fairness and confidentiality)
Clustering
Process mining
© PADS (use only with permission & acknowledgements)
Text mining
# Lecture date day
Lecture 1 Introduction 28/10/2020 Wednesday
Instruction 1 Python 29/10/2020 Thursday
Instruction 2 Crash Course in Python 30/10/2020 Friday
Lecture 2 Basic data visualization/exploration 04/11/2020 Wednesday
Lecture 3 Decision trees 05/11/2020 Thursday
Instruction 3 Decision trees and data visualization/exploration 06/11/2020 Friday
Lecture 4 Regression 11/11/2020 Wednesday
Lecture 5 Support vector machines 12/11/2020 Thursday
Instruction 4 Regression and support vector machines 13/11/2020 Friday
Lecture 6 Neural networks (1/2) 18/11/2020 Wednesday
Lecture 7 Neural networks (2/2) 19/11/2020 Thursday
Instruction 5
Lecture 15 Text mining (1/2)
Neural networks 20/11/2020 Friday
06/01/2021 Wednesday
Lecture 8 Evaluation of supervised learning problems 25/11/2020 Wednesday
Instruction 6
Lecture 16 Text mining (2/2)
Neural networks and evaluation 27/11/2020 Friday
07/01/2021 Thursday
Lecture 9 Clustering 02/12/2020 Wednesday
Lecture 10 Instruction 10 Process mining
Frequent items sets 03/12/2020 Thursday 08/01/2021 Friday
Instruction 7 Clustering and frequent item sets 04/12/2020 Friday
Lecture 11 Lecture 17 Data preprocessing, data quality, binning, etc.
Association rules 09/12/2020 Wednesday 13/01/2021 Wednesday
Lecture 12 Sequence mining 10/12/2020 Thursday
Instruction 8 Lecture 18 Visual analytics & information visualization
Association rules and sequence mining 11/12/2020 Friday 14/01/2021 Thursday
Lecture 13 Process mining (unsupervised) 16/12/2020 Wednesday
Lecture 14 Instruction 11 Text mining
Process mining (supervised) 17/12/2020 Thursday 15/01/2021 Friday
Instruction 9 Q&A assignment 1 18/12/2020 Friday
Lecture 15 Lecture 19 Responsible data science (1/2)
Text mining (1/2) 06/01/2021 Wednesday 20/01/2021 Wednesday
Lecture 16 Text mining (2/2) 07/01/2021 Thursday
Instruction 10 Lecture 20 Responsible data science (2/2)
Process mining 08/01/2021 Friday 21/01/2021 Thursday
Lecture 17 Data preprocessing, data quality, binning, etc. 13/01/2021 Wednesday
Lecture 18 Instruction 12 Preprocessing and visualization
Visual analytics & information visualization 14/01/2021 Thursday 22/01/2021 Friday
Instruction 11 Text mining 15/01/2021 Friday
Lecture 19 Lecture 21 Big data
Responsible data science (1/2) 20/01/2021 Wednesday 27/01/2021 Wednesday
Lecture 20 Responsible data science (2/2) 21/01/2021 Thursday
Instruction 12 Instruction 13 Responsible data science
Preprocessing and visualization 22/01/2021 Friday 28/01/2021 Thursday
Lecture 21 Big data 27/01/2021 Wednesday
Instruction 13
Instruction 14 Big data (1/2)
Responsible data science 28/01/2021 Thursday
29/01/2021 Friday
Instruction 14 Big data (1/2) 29/01/2021 Friday
Lecture 22
Instruction 15
Closing
Lecture 22 Closing
Big data (2/2)
03/02/2021
04/02/2021
Wednesday
Thursday
03/02/2021 Wednesday
Instruction 16 Q&A assignment 2 05/02/2021 Friday
Instruction 17 Example exam questions 10/02/2021 Wednesday
Instruction 18 Questions 11/02/2021 Thursday
© PADS (use only with permission & acknowledgements)

You might also like