Professional Documents
Culture Documents
John F Tukey
The vaccination debate was
all the rage again. “Pro-
vaxxers” were loudly
proclaiming that everyone
should get vaccinated and
discussing the science behind
it, and “anti-vaxxers” were
casting their doubts and still
refusing to get vaccinated for
personal reasons. Around that
time,
3 Minutes 15 Seconds
In 1812, Napoleon marched to Moscow in order to conquer the city. 98% of his
soldiers died.
The Parisian engineer Charles Minard’s visualization still inspires those who see it to
ponder the true cost of war.
It displays six types of data in two dimensions:
1. the number of Napoleon's troops; 2. the distance traveled; 3. temperature;
4. latitude and longitude; 5. direction of travel; 6. location relative to specific dates
Florence Nightingale was a Data nerd who provided insights
1855. The Crimea. Britain is fighting a battle with both Russia and disease. As a nurse, how do you convince an
army to invest in hospitals and healthcare instead of guns and ammunition?
Florence Nightingale told her story with data by showing the staggering amount of deaths due to preventable
disease (shown in blue/grey).
After this vizualization, sanitation became a major priority for the British Army.
http://livelovetravelwork.com/florence-nightingale-data-nerd/
https://understandinguncertainty.org/coxcombs
Assignment 1.
Chart of Biography – Joseph Priestley
Data/Processes Algorithm
Image Perception
A configuration or pattern of elements so unified as a whole that it cannot be described merely as a sum of its parts
Prepared by :
• Gestalt refers to the patterns that you Gestalt principles describe how our mind
perceive when presented with few graphical organizes individual visual elements into groups,
elements. to make sense of the entire visual
principles of gestalt
Continuity
Closure
Validate:
Which chart is suitable based on precognition concept in
DV in the example below?
Grades Grades
160
150 150 150 150
140
125 125
120
100
# Students
# Students
80 75
75
Series1
60
50 50
50 50
40
20
0
A B C D E F
Series1 150 150 125 50 75 50
A B C D E F Grades
The Visual Process
1) Who and what is involved. Give a visual summary of the people
and things you are going to be talking about.
2) How many are involved. Next, provide a quantitative measure (or
many measures) of the people or things. Changes in number
(trends) are particularly revealing.
3) Where the pieces are located. Present a map illustrating the
relative position of these people or things according to
geographical or conceptual coordinates.
4) When things occur. Show a timeline that illustrates the sequence
in which these people or things interact, or the steps required to
bring them into alignment.
5) How things impact each other. Provide a flowchart that adds
cause-and-effect influences superimposed on any (or all) of your
previous pictures; show the change and how you will achieve it.
6) Why this matters. Complete your visual story with a concluding
“visual equation” that summarizes the keep learnings, takeaways,
or action items triggered by the previous visual insights.
Start thinking visually
• Consider the Nature and Purpose of your visualization: Ask
these two questions: NATURE
Is the information
conceptual or
Is it Conceptual or Data Driven? data-driven?
PURPOSE
1. We will talk about the organizational
Am I declaring
structure of the firm.
something or
exploring
something?
2. We will talk about the past two years
revenue trends.
Start thinking visually
• Consider the Nature and Purpose of your visualization: Ask
these two questions:
Am I declaring
something or
exploring
something?
Is it declaration or Exploration?
Purpose: To understand why the sales team’s performance has lagged lately?
The Dos and Don’ts of Data Visualization
• Time axis. When using time in charts, set it on the horizontal axis. Time should
run from left to right. Do not skip values (time periods), even if there are no
values.
• Proportional values. The numbers in a chart (displayed as bar, area, bubble, or
other physically measured element in the chart) should be directly proportional
to the numerical quantities presented.
• Data-Ink Ratio. Remove any excess information, lines, colors, and text from a
chart that does not add value.
• Sorting. For column and bar charts, to enable easier comparison, sort your data
in ascending or descending order by the value, not alphabetically. This applies
also to pie charts.
• Legend. You don’t need a legend if you have only one data category.
• Labels. Use labels directly on the line, column, bar, pie, etc., whenever possible,
to avoid indirect look-up.
The Dos and Don’ts of Data Visualization
• Inflation adjustment. When using monetary values in a long-term series, make sure
to adjust for inflation.
• Colors. In any chart, don’t use more than six colors.
• Colors. For comparing the same value at different time periods, use the same color
in a different intensity (from light to dark).
• Colors. For different categories, use different colors. The most widely used colors are
black, white, red, green, blue, and yellow.
• Colors. Keep the same color palette or style for all charts in the series, and same axes
and labels for similar charts to make your charts consistent and easy to compare.
• Colors. Check how your charts would look when printed out in grayscale. If you
cannot distinguish color differences, you should change hue and saturation of colors.
• Colors. Seven to 10 percent of men have color deficiency. Keep that in mind when
creating charts, ensuring they are readable for color-blind people. Use Vischeck to
test your images. Or, try to use color palettes that are friendly to color-blind people.
The Dos and Don’ts of Data Visualization
• Data Complexity. Don’t add too much information to a single chart. If necessary,
split data in two charts, use highlighting, simplify colors, or change chart type.
Rate this chart!
Ethics in visualization
Validate Dos and Don’ts
Even with labeled % the pie size for Even without the % label the message is
Person A, B and C appears to be same. unambiguous
https://venngage.com/blog/misleading-graphs/
Pie?
Wrong Choice of Axis
No. of Market
Products Sales Share
14 11200 13
20 60000 23
18 14400 5
70000 30
30
60000 25
25
50000 20
20
40000 15
15
30000 10
10 5
20000
5 0
10000
0 0 5 10 15 20 25 30
0
0 5 10 15 20 25
0 5 10 15 20 25
Wrong Choice of Axis
F2F Phone Text IM IRC Mail Blogs Feeds Twitter
Immediacy, Lifespan, and
Immediacy 40 40 20 30 30 10 5 10 10 Audience scores were assigned
Lifespan 3 3 10 10 10 30 40 10 10 arbitrary in the range 0-40 to a
Audience 3 3 3 10 20 20 40 30 30 bunch of communication
modes.
Beautiful Vs Effective
https://www.tableau.com/about/blog/2016/4/examining-data-viz-rules-dont-use-red-green-together-53463
ACCENT Principles for effective graphical display
• Apprehension: Are you able to correctly perceive relations among
variables?
• Clarity: Are the most important elements or relations visually most
prominent?
• Consistency: Are the elements, symbol shapes and colors consistent with
their use in previous graphs?
• Efficiency: Is the graph easy to interpret?
• Necessity: Is the graph a more useful way to represent the data than
alternatives (table, text)?
• Truthfulness: Are the graph elements by their magnitude accurately
positioned and scaled relative to the implicit or explicit scale.?
http://www.datavis.ca/gallery/accent.php http://www.datavis.ca/gallery/index.php
variables :
Price, Gear Ratio and Turning Circle etc.
Larger values represent "better" for all variables;
All variables are first scaled to a 0-1 range.
Variables are arranged around the circle by a multivariate effect
ordering according to their order on the largest discriminant
dimension.
The error bars next to each radial axis shows the smallest value
of a difference between means required for a (univariate) .05
significant difference.
Rating 1 to 5
Apprehension: Correctly perceive relations among variables
Clarity: Are the most important elements visually most
prominent
Consistency: Are the elements, symbol shapes and colors
consistent
Efficiency: Is the graph easy to interpret
Necessity: Is it a more useful way to represent the data than
table
Truthfulness: Lie-factor
The goal of the graphic was to present results of a poll of
happiness from the World Values Survey project of people
throughout the world in relation to economic status, as
measured by GNP per capita.
Rating 1 to 5
Apprehension: Correctly perceive relations among variables
Clarity: Are the most important elements visually most
prominent
Consistency: Are the elements, symbol shapes and colors
consistent
Efficiency: Is the graph easy to interpret
Necessity: Is it a more useful way to represent the data than
table
Truthfulness: Lie-factor
ACCENT Principles for effective graphical display
http://www.datavis.ca/gallery/index.php
Visit this web resource and rate some Good and Bad graphs on 5 point rating scale on the following dimensions
Rating 1 to 5
Apprehension: Correctly perceive relations among variables
Clarity: Are the most important elements visually most
prominent
Consistency: Are the elements, symbol shapes and colors
consistent
Efficiency: Is the graph easy to interpret
Necessity: Is it a more useful way to represent the data than
table
Truthfulness: Lie-factor
Assignments and Websites
Based on your learning so far, create a presentation. Go through a website from the
list below or any other visualization learning website and pick at least one topic
which we haven’t covered in the class so far. It can even be a nice visualization
project.
You need to present that in the class so that others can also learn from it. This is
your third assignment as well. We’ll pick everybody to present at random starting
December 2nd class.
Have at least 2 questions for the class to be discussed as your first or last slide.
• http://www.storytellingwithdata.com/
• http://www.edwardtufte.com/tufte/
• http://www.visualcomplexity.com/
• http://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-
visualization/
ADDITIONAL CONTENT
Some Visualization Tools
Tableau
Interactive
Prefuse Gnuplot
NodeXL
Pajek Weka
Orange GUI
From the Memory Lanes
Reliability and Assumption
Types of Data Validity of a Data Cleaning Testing
measure. Inference
Nominal
Ordinal
Scale
Dimensions
Items
Factors
Questionnaire
Likert
Developing a Scale
Canon of research:
If something exists,
it can be measured
in numerals
Univariate outliers using Box-plot Multivariate outliers using M-distance
Outliers
Univariate and
multivariate
outliers
Reliability and Assumption
Types of Data Validity of a Data Cleaning Testing
measure. Inference
Some Fun
Some Fun
Examples
• Baby Name Wizard
http://www.babynamewizard.com/voyager
https://gramener.com/faces/
• Netflix Queues
http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html?ref=nyregion