Scientific Design Choices in Data Visualization

Scientific Design Choices in data visualization: Graphical Visualization
Definition:
Graphical visualization is an area of mathematics and computer science, at the

intersection of geometric graph theory and information visualization. It is concerned with
visual representation of graphs that reveals structures and anomalies that may be present
in the data and helps the user to understand and reason about the graphs.
Overview:
Graphical visualization is concerned with visual representations of graph or network data.

Effective graphical visualization reveals structures that may be present in the graphs and
helps the users to understand and analyze the underlying data.
A graph consists of nodes and edges. It is a mathematical structure describing relations

among a set of entities, where a node represents an entity, and an edge exists between
two nodes if the two corresponding entities are related.
There are eight types of quantitative messages that users may attempt to understand or
communicate from a set of data and the associated graphs used to help communicate the
message:
1. Time-series: A single variable is captured over a period of time, such as the

unemployment rate over a 10-year period. A line chart may be used to demonstrate
the trend.
2. Ranking: Categorical subdivisions are ranked in ascending or descending order,
such as a ranking of sales performance (the measure) by sales persons
(the category, with each sales person a categorical subdivision) during a single
period. A bar chart may be used to show the comparison across the sales persons.
3. Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a
percentage out of 100%). A pie chart or bar chart can show the comparison of ratios,
such as the market share represented by competitors in a market.
4. Deviation: Categorical subdivisions are compared against a reference, such as a
comparison of actual vs. budget expenses for several departments of a business for
a given time period. A bar chart can show comparison of the actual versus the
reference amount.
5. Frequency distribution: Shows the number of observations of a particular variable for
given interval, such as the number of years in which the stock market return is
between intervals such as 0-10%, 11-20%, etc. A histogram, a type of bar chart, may
be used for this analysis. A boxplot helps visualize key statistics about the
distribution, such as median, quartiles, outliers, etc.
6. Correlation: Comparison between observations represented by two variables (X,Y) to
determine if they tend to move in the same or opposite directions. For example,
plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is
typically used for this message.
7. Nominal comparison: Comparing categorical subdivisions in no particular order, such
as the sales volume by product code. A bar chart may be used for this comparison.
8. Geographic or geospatial: Comparison of a variable across a map or layout, such as
the unemployment rate by state or the number of persons on the various floors of a
building. A cartogram is a typical graphic used.
Analysts reviewing a set of data may consider whether some or all of the messages and
graphic types above are applicable to their task and audience. The process of trial and error
to identify meaningful relationships and messages in the data is part of exploratory data
analysis.
Visual
Description /
dimens
Name Example usages
ions
 Presents categorical
data with rectangula
r bars
with heights or lengt
hs proportional to
the values that they
represent. The bars
can be plotted
vertically or
horizontally.
 A bar graph shows
comparisons
among discrete cate
gories. One axis of
the chart shows the
specific categories
being compared,
and the other axis
 length/count
Bar represents a
 category
chart measured value.
 color
 Some bar graphs
Bar chart of tips by present bars
clustered in groups
day of week of more than one,
showing the values
of more than one
measured variable.
These clustered
groups can be
differentiated using
color.
 For example;
comparison of
values, such as
sales performance
for several persons
or businesses in a
single time period.
 An approximate
representation of
the distribution of
numerical data.
Divide the entire
range of values into
a series of intervals
and then count how
many values fall into
each interval this is
called binning. The
bins are usually
specified as
consecutive, non-
overlapping intervals
of a variable. The
bins (intervals) must
 bin limits be adjacent, and are
Histogra often (but not
 count/length
m required to be) of
 color
equal size.
 For example,
Histogram of housing determining
frequency of annual
prices stock market
percentage returns
within particular
ranges (bins) such
as 0-10%, 11-20%,
etc. The height of
the bar represents
the number of
observations (years)
with a return % in
the range
represented by the
respective bin.
Scatter  x position  Uses Cartesian

plot  y position coordinates to
 symbol/glyph display values for
 color typically
 size two variables for a
set of data.
 Points can be coded
via color, shape
Basic scatterplot of and/or size to
two variables display additional
variables.
 Each point on the
plot has an
associated x and y
term that determines
its location on the
cartesian plane.
 Scatter plots are
often used to
highlight the
correlation between
variables (x and y).
 Similar to the 2-
dimensional scatter
plot above, the 3-
 position x dimensional scatter
 position y plot visualizes the
relationship between
 position z
typically 3 variables
 color from a set of data.
 symbol  Again point can be
 size coded via color,
shape and/or size to
Scatter plot display additional
variables
 Finding clusters in
the network (e.g.
grouping Facebook
friends into different
clusters).
 Discovering bridges
(information brokers
or boundary
spanners) between
clusters in the
 nodes size network
 nodes color  Determining the
Network  ties thickness most influential
 ties color nodes in the
 spatialization network (e.g. A
Network analysis company wants to
target a small group
of people on Twitter
for a marketing
campaign).
 Finding outlier
actors who do not fit
into any cluster or
are in the periphery
of a network.
 Represents one
categorical variable
which is divided into
slices to illustrate
numerical
proportion. In a pie
chart, the arc
length of each slice
(and consequently
Pie its central
 color
chart angle and area),
is proportional to the
quantity it
represents.
Pie chart  For example, as
shown in the graph
to the right, the
proportion
of English native
speakers worldwide
 Represents
information as a
series of data points
called 'markers'
connected by
straight line
segments.
 Similar to a scatter
 x position plot except that the
 y position measurement points
Line are ordered
 symbol/glyph
chart (typically by their x-
 color axis value) and
 size joined with straight
line segments.
Line chart  Often used to
visualize a trend in
data over intervals
of time – a time
series – thus the line
is often drawn
chronologically.
 A type of
stacked area
graph which is
displaced around
a central axis,
resulting in a flowing
shape.
 Unlike a traditional
stacked area graph
in which the layers
are stacked on top
of an axis, in a
streamgraph the
layers are
 width
Stream positioned to
 color
graph minimize their
 time (flow) "wiggle".
 Streamgraphs
display data with
only positive values,
Streamgraph and are not able to
represent both
negative and
positive values.
 For example, the
right visual shows
the music listened to
by a user over the
start of the year
2012
 Is a method for
displaying hierarchic
al data
Treema  size using nested figures
p  color , usually rectangles.
 For example disk
space by location /
file type
Treemap
 Type of bar
chart that illustrates
a project schedule
 Modern Gantt charts
also show
Gantt  color
the dependency rela
chart  time (flow)
tionships between
activities and current
schedule status.
Gantt chart  For example used
in project planning
 Represents the
magnitude of a
phenomenon as
color in two
dimensions.
 There are two
categories of heat
maps:
o cluster heat
map: where
magnitudes are
laid out into a
matrix of fixed
cell size whose
rows and
 color columns are
Heat
 categorical categorical data.
map
variable For example,
the graph to the
right.
Heat map o spatial heat
map: where no
matrix of fixed
cell size for
example a heat-
map. For
example, a heat
map showing
population
densities
displayed on a
geographical
map
 Uses a series of
colored stripes
chronologically
ordered to visually
portray long-term
temperature trends.
 Portrays a single
variable—
prototypically tempe
rature over time to
Stripe  x position
portray global
graphic  color warming
 Deliberately minimal
ist—with no
Stripe graphic
technical indicia—to
communicate
intuitively with non-
scientists[31]
 Can be "stacked" to
represent plural
series (example)
Box and  x axis  A method for

Whisker  y axis graphically depicting
Plot groups of numerical
data through
their quartiles.
 Box plots may also
have lines extending
from the boxes
(whiskers) indicating
variability outside
the upper and lower
quartiles.
 Outliers may be
Box and whisker plot plotted as individual
points.
 The two boxes
graphed on top of
each other
represent the middle
50% of the data,,
with the line
separating the two
boxes identifying the
median data value
and the top and
bottom edges of the
boxes represent the
75th and 25th
percentile data
points respectively.
 Box plots are non-
parametric: they
display variation in
samples of
a statistical
population without
making any
assumptions of the
underlying statistical
distribution, thus are
useful for getting an
initial understanding
of a data set. For
example, comparing
the distribution of
ages between a
group of people
(e.g. male and
females).
 Represents
a workflow, process
or a step-by-step
approach to solving
a task.
 The flowchart shows
the steps as boxes
of various kinds, and
Flowcha  workflow or pro
their order by
rt cess
connecting the
boxes with arrows.
 For example,
outlying the actions
to undertake if a
lamp is not working,
as shown in the
diagram to the right.
Flowchart
 Displays multivariate
data in the form of a
two-dimensional cha
rt of three or more
quantitative
variables
represented on axes
starting from the
same point.
 The relative position
and angle of the
axes is typically
uninformative, but
various heuristics,
such as algorithms
that plot data as the
maximal total area,
 attributes can be applied to
Radar
 value assigned sort the variables
chart
to attributes (axes) into relative
positions that reveal
distinct correlations,
trade-offs, and a
Radar chart multitude of other
comparative
measures.
 For example,
comparing
attributes/skills (e.g.
communication,
analytical, IT skills)
learnt across
different a university
degrees (e.g.
mathematics,
economics,
psychology)
Venn  all possible logi  Shows all possible l

diagram cal relations ogical relations
between a finite between a finite
collection of collection of
different sets. different sets.
 These diagrams
depict elements as
points in the plane,
and sets as regions
inside closed
curves.
 A Venn diagram
consists of multiple
overlapping closed
curves, usually
circles, each
representing a set.
 The points inside a
curve
labelled S represent
elements of the
set S, while points
outside the
boundary represent
elements not in the
set S. This lends
itself to intuitive
visualizations; for
example, the set of
Venn diagram all elements that are
members of both
sets S and T,
denoted S ∩ T and
read "the
intersection
of S and T", is
represented visually
by the area of
overlap of the
regions S and T. In
Venn diagrams, the
curves are
overlapped in every
possible way,
showing all possible
relations between
the sets.

Scientific Design Choices in Data Visualization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scientific Design Choices in Data Visualization

Uploaded by

Copyright:

Available Formats

Scientific Design Choices in data visualization: Graphical Visualization

Graphical visualization is an area of mathematics and computer science, at the

Graphical visualization is concerned with visual representations of graph or network data.

A graph consists of nodes and edges. It is a mathematical structure describing relations

1. Time-series: A single variable is captured over a period of time, such as the

Scatter  x position  Uses Cartesian

Box and  x axis  A method for

Venn  all possible logi  Shows all possible l

You might also like