Professional Documents
Culture Documents
• Lots of (Big) data, but raw data, has no value, only extracted information
has.
• Not just known unknowns, also unknown unknowns.
• Visual exploration is a vital first step (understanding, spotting data quality
problems, building trust, etc.).
• Combination of visualization and automated learning is often superior.
Étienne-Jules
Marey 1885.
TGV today
(less than 2 hours)
music-timeline.appspot.com
© PADS (use only with permission & acknowledgements) Based on album and artist statistics aggregated from Google Play Music
Classical Visualization Examples
Same problems
𝒉𝒉 𝟐𝟐𝟐𝟐
𝝅𝝅𝒓𝒓𝟐𝟐 𝒉𝒉
visualization
(un)supervised
learning
Sophisticated
learning techniques
can be amazingly
stupid!
Visualization cannot be
used to find concurrency
in processes.
We see weekly patterns and drifts, but cannot see the process.
We see the process, but cannot see weekly patterns and drifts.
Details on Demand”
© PADS (use only with permission & acknowledgements)
Visual Analytics
Examples
D. Keim et. al, Visual Analytics Scope and Challenges in: Visual Data Mining, Springer, 2008
worse better
than rest than rest
but worse
than rest positive
now
time
911 calls
Visual Analytics - Examples
https://www.economist.com/briefing/2015/09/12/strangers-in-strange-lands
Sankey diagram
immigration data 2015
http://www.sankey-diagrams.com/tag/building/
Visual Analytics - Examples
Sankey diagram plastic
streams 2015
https://www.plasticsrecyclers.eu/sites/d
efault/files/PRE_blueprint%20packaging
%20waste_Final%20report%202017.pdf
Periodic Table of Visualization Methods
https://www.visual-literacy.org/periodic_table/periodic_table.html
Visual Analytics
Challenges
is challenging
© PADS (use only with permission & acknowledgements)
Advanced Visualization
Techniques
A B C
50 100 2.0
30 115 0.5
https://datavizcatalogue.com/methods/parallel_coordinates.html
• Interpretation
• Positive Correlation: Cluster
of similar lines
• Negative Correlation: No
cluster but similar crossing
point of axis
• Outliers: Lines isolated from
clusters and lines with
different paths
• Advantages:
• Good for summarizing big datasets
• Good to discover trends and
patterns over time (seasonal
changes etc.) Stream graph visualizing the balance of play for the
• Disadvantages: world cup match between Spain and the
Netherlands.
• Might get very cluttered with large
data sets Red area: The strength of the Spanish team.
Blue area: The strength of the Dutch team.
• Categories with small values are White line: Balance of the game between both
difficult to spot teams.
https://datavizcatalogue.com/methods/heatmap.html
Advanced visualization - Node-link tree
http://bl.ocks.org/mbostock/4063550
Advanced visualization – Sunburst
https://datavizcatalogue.com/methods/sunburst_diagram.html
Advanced visualization – Tree map
Tree map showing Elderly population in Europe per country and its
regions. The size of the rectangle represents the total population
and the colour shows the amount of people above 65 years.
https://ncva.itn.liu.se/education-geovisual-analytics/treemap?l=en
User interaction
change view
User interaction – Manipulate
(visualization, not data)
User interaction – Facet
(exploit dimensions, side-by-side)
Linked highlighting
• Obtain a different view of the data or views (e.g., pie
to be able to present/compare chart and process
model or
relevant parts of the data geographic map
• Juxtapose: Putting two views next and bar chart).
to each other
• Partition: Split the view
• Superimpose: Overlay two views Split instances.
Different elements
are joined in layers
(e.g., locations of
vehicles and
locations of people)
User interaction – Facet
(exploit dimensions, side-by-side)
numeric/ordered
© PADS (use only with permission & acknowledgements)
Visual encoding principles
Best
• The channels can be ranked
in their effectiveness of
expressing differences
• Rule of thumb: encode most
important attributes with
highest ranked channel
Least
© PADS (use only with permission & acknowledgements)
Visual encoding principles
• Which plot visualizes the difference between its data points better?
Decision trees
Regression Visual analytics &
Data preprocessing
Support vector machines information visualization
Neural networks
Big data infrastructures Association rules Responsible data science
Sequence mining (fairness and confidentiality)
Clustering
Process mining
© PADS (use only with permission & acknowledgements)
Text mining
# Lecture date day
Lecture 1 Introduction 28/10/2020 Wednesday
Instruction 1 Python 29/10/2020 Thursday
Instruction 2 Crash Course in Python 30/10/2020 Friday
Lecture 2 Basic data visualization/exploration 04/11/2020 Wednesday
Lecture 3 Decision trees 05/11/2020 Thursday
Instruction 3 Decision trees and data visualization/exploration 06/11/2020 Friday
Lecture 4 Regression 11/11/2020 Wednesday
Lecture 5 Support vector machines 12/11/2020 Thursday
Instruction 4 Regression and support vector machines 13/11/2020 Friday
Lecture 6 Neural networks (1/2) 18/11/2020 Wednesday
Lecture 7 Neural networks (2/2) 19/11/2020 Thursday
Instruction 5
Lecture 15 Text mining (1/2)
Neural networks 20/11/2020 Friday
06/01/2021 Wednesday
Lecture 8 Evaluation of supervised learning problems 25/11/2020 Wednesday
Instruction 6
Lecture 16 Text mining (2/2)
Neural networks and evaluation 27/11/2020 Friday
07/01/2021 Thursday
Lecture 9 Clustering 02/12/2020 Wednesday
Lecture 10 Instruction 10 Process mining
Frequent items sets 03/12/2020 Thursday 08/01/2021 Friday
Instruction 7 Clustering and frequent item sets 04/12/2020 Friday
Lecture 11 Lecture 17 Data preprocessing, data quality, binning, etc.
Association rules 09/12/2020 Wednesday 13/01/2021 Wednesday
Lecture 12 Sequence mining 10/12/2020 Thursday
Instruction 8 Lecture 18 Visual analytics & information visualization
Association rules and sequence mining 11/12/2020 Friday 14/01/2021 Thursday
Lecture 13 Process mining (unsupervised) 16/12/2020 Wednesday
Lecture 14 Instruction 11 Text mining
Process mining (supervised) 17/12/2020 Thursday 15/01/2021 Friday
Instruction 9 Q&A assignment 1 18/12/2020 Friday
Lecture 15 Lecture 19 Responsible data science (1/2)
Text mining (1/2) 06/01/2021 Wednesday 20/01/2021 Wednesday
Lecture 16 Text mining (2/2) 07/01/2021 Thursday
Instruction 10 Lecture 20 Responsible data science (2/2)
Process mining 08/01/2021 Friday 21/01/2021 Thursday
Lecture 17 Data preprocessing, data quality, binning, etc. 13/01/2021 Wednesday
Lecture 18 Instruction 12 Preprocessing and visualization
Visual analytics & information visualization 14/01/2021 Thursday 22/01/2021 Friday
Instruction 11 Text mining 15/01/2021 Friday
Lecture 19 Lecture 21 Big data
Responsible data science (1/2) 20/01/2021 Wednesday 27/01/2021 Wednesday
Lecture 20 Responsible data science (2/2) 21/01/2021 Thursday
Instruction 12 Instruction 13 Responsible data science
Preprocessing and visualization 22/01/2021 Friday 28/01/2021 Thursday
Lecture 21 Big data 27/01/2021 Wednesday
Instruction 13
Instruction 14 Big data (1/2)
Responsible data science 28/01/2021 Thursday
29/01/2021 Friday
Instruction 14 Big data (1/2) 29/01/2021 Friday
Lecture 22
Instruction 15
Closing
Lecture 22 Closing
Big data (2/2)
03/02/2021
04/02/2021
Wednesday
Thursday
03/02/2021 Wednesday
Instruction 16 Q&A assignment 2 05/02/2021 Friday
Instruction 17 Example exam questions 10/02/2021 Wednesday
Instruction 18 Questions 11/02/2021 Thursday
© PADS (use only with permission & acknowledgements)